You’ll automate redirects,
handle AI hallucinations more
effectively, and understand
what embeddings are
Slide 3
Slide 3 text
We don’t speak the
same language
Humans think in concepts, computers
process data.
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Source: Google Trends (query: vector embeddings)
Slide 6
Slide 6 text
6
@frankvndijk
What are embeddings?
“Embeddings are numerical representations of
data (like words, images, or audio) in a multi-
dimensional space”
Images Audio
Text
Embedding model
0.9 0.7 0.2 0.6
Slide 7
Slide 7 text
7
@frankvndijk
Images Audio
Text
Embedding model
0.9 0.7 0.2 0.6
Slide 8
Slide 8 text
8
@frankvndijk
cat
Embedding model
0.9 0.7 -0.3 0.6
Slide 9
Slide 9 text
9
@frankvndijk
-0.2
cat
Embedding model
0.9 0.7 -0.3 0.6
dog
Embedding model
0.9 0.6 -0.2 0.8
pet
Embedding model
0.9 0.7 -0.2 0.9
lion
Embedding model
0.9 0.2 0.8
12
@frankvndijk
What is cosine similarity?
“Cosine similarity measures the angle between
two embeddings in a multi-dimensional space
to determine how similar they are”
cat
dog
Slide 13
Slide 13 text
13
@frankvndijk
0 1
Identical
No similarity
Similarities
really arise
Cosine similarity
Always a score between 0 (or -1) and 1
Slide 14
Slide 14 text
15
@frankvndijk
dog
cat
lion
shark
fox
meerkat
Slide 15
Slide 15 text
16
@frankvndijk
Practical examples
Chatbots
Recommendation systems Search engines
Slide 16
Slide 16 text
17
@frankvndijk
Search query ‘spring’
Slide 17
Slide 17 text
18
@frankvndijk
Search query ‘spring season’
Slide 18
Slide 18 text
19
@frankvndijk
“We no longer live in a keyword era, but
in an era of search intent”
since 2013…
Slide 19
Slide 19 text
20
@frankvndijk
What is a vector database?
"A vector database stores data as high-
dimensional vectors for efficient similarity
searches and AI applications."
0.4 0.8 0.3 0.7 0.3 0.7 0.4 0.6 0.5 0.6 0.5 0.5
Slide 20
Slide 20 text
21
@frankvndijk
Dataset Embedding model
LLM Relevant data
Answer
Embedding model
User question
Slide 21
Slide 21 text
22
@frankvndijk
128.000 tokens context window
+/- 96.000 words
4.096 token limit
+/- 3.000 words
Limits of ChatGPT
Slide 22
Slide 22 text
23
@frankvndijk
0.4 0.8 0.3 0.7
Embedding model
User question
Connect ChatGPT to our database
Slide 23
Slide 23 text
24
@frankvndijk
Connect ChatGPT to our database
Slide 24
Slide 24 text
25
@frankvndijk
This has major advantages
It helps to prevent
hallucinations
Have control over
what data you use
Use real time or
new data
Slide 25
Slide 25 text
26
@frankvndijk
Let’s get practical!
Slide 26
Slide 26 text
27
@frankvndijk
Easy way of starting with embeddings
Python
Screaming Frog
Slide 27
Slide 27 text
28
@frankvndijk
Slide 28
Slide 28 text
29
@frankvndijk
Different models for creating embeddings
Slide 29
Slide 29 text
30
@frankvndijk
Embedding models from OpenAI
text-embedding-ada-002 text-embedding-3-small text-embedding-3-large
Released December 2022
1536 dimensions
Released January 2024
1024 dimensions
Released January 2024
3072 dimensions
*Source: benchmark from datacamp.com
Slide 30
Slide 30 text
31
@frankvndijk
Screaming Frog OpenAI API Embedding model
Request
Response
Slide 31
Slide 31 text
32
@frankvndijk
Correct settings in SF
Make sure that your crawl configurator is
set properly:
● Extraction => Store Rendered
HTML
● Rendering => JavaScript
Slide 32
Slide 32 text
33
@frankvndijk
Connect with OpenAI
Make the connection with OpenAI in your
crawl:
● Add your API from OpenAI
● Choose the “Extract embeddings
form page content” template
Slide 33
Slide 33 text
34
@frankvndijk
Visible in your crawl
Next, the embeddings will be visible in
your crawl:
● Go to the AI tab
● Scroll to the embeddings
37
@frankvndijk
Three scripts for embeddings
Internal link opportunities
Redirect mapping Duplicate content analyses
I will give them away
Slide 37
Slide 37 text
38
@frankvndijk
SF crawl with
embeddings
Checking cosine similarity
Checking existing link in
HTML
Internal link
recommendations
Webpage embedding
Rest of the embeddings
Internal link opportunities
Checking similarity Checking relevancy
Gathering pages
Slide 38
Slide 38 text
39
@frankvndijk
Checking similarity
Checking relevancy
Gathering pages
Gather the pages we
want to optimise
Gather the pages we want
to optimize
Slide 39
Slide 39 text
40
@frankvndijk
Check cosine
similarity
Checking if the similarity
is at least 0.85
Checking similarity
Checking relevancy
Gathering pages
Slide 40
Slide 40 text
41
@frankvndijk
Checking similarity
Checking relevancy
Gathering pages
Check for in content
link
Checking for potential link
in HTML of page
Slide 41
Slide 41 text
42
@frankvndijk
Checking similarity
Checking relevancy
Gathering pages
Looping through all
pages
Looping through this steps
to find all recommendations
Slide 42
Slide 42 text
45
@frankvndijk
Use my Google Colabs to run them
Give it your input
Run the script Download the results
Slide 43
Slide 43 text
46
@frankvndijk
What about databases?
Slide 44
Slide 44 text
47
@frankvndijk
Case
A client was not present in the informational and
orientation phase of the customer journey
Solution
Creating content based on a semantic search in a
vector database
Slide 45
Slide 45 text
48
@frankvndijk
Blog subject
Writing process
Extract data
Blog content
DB search
Finding the right blog
subjects
Searching for our products in
combination with ‘best’
Slide 46
Slide 46 text
49
@frankvndijk
Blog subject
Writing process
Extract data
Blog content
DB search
Semantic searches for
products that match
Getting the products that
matches the blog subject
Slide 47
Slide 47 text
50
@frankvndijk
Blog subject
Writing process
Extract data
Blog content
DB search
Getting the product
information
Getting the products that
matches the blog subject
Slide 48
Slide 48 text
51
@frankvndijk
Blog subject
Writing process
Extract data
Blog content
DB search
Write content
With AI we managed to make
a concept of the blog content
Slide 49
Slide 49 text
52
@frankvndijk
Case
Another clients website didn’t appear in the AI
overviews for key keywords, while competitors did
Solution
Analyze and predict with embeddings and other
things when content could be displayed
Slide 50
Slide 50 text
53
@frankvndijk
Keywords & URLs
Write
Rewrite
Update content
Validation
Optimized content
Prediction
Scrape the AI
overviews
Scraping content shown in
the AI overview
Slide 51
Slide 51 text
54
@frankvndijk
Keywords & URLs
Write
Rewrite
Update content
Validation
Optimized content
Prediction
Scraping competitors
Scraping the content of the
competitor shown
Slide 52
Slide 52 text
55
@frankvndijk
Keywords & URLs
Write
Rewrite
Update content
Validation
Optimized content
Prediction
More data for the
prediction
Gathering other relevant data
that is important
Slide 53
Slide 53 text
56
@frankvndijk
Keywords & URLs
Write
Rewrite
Update content
Validation
Optimized content
Prediction
Prediction partially
based on embeddings
We predict if the content is
capable to be shown
Slide 54
Slide 54 text
57
@frankvndijk
Keywords & URLs
Write
Rewrite
Update content
Validation
Optimized content
Prediction
Optimization advice
We (re)write content so we
are capable to be shown
Slide 55
Slide 55 text
58
@frankvndijk
Results
After implementing:
● Before: 32.63% of AI Overviews contained a link
● After: 54.48% of AI Overviews contained a link
+67% +49%
Increase in clicks
Increase in display of links
Slide 56
Slide 56 text
59
@frankvndijk
Join the embeddings
movement
Start experimenting with embeddings
and discover what’s possible.
This is where the future begins
Slide 57
Slide 57 text
60
@frankvndijk
Key takeaways What embeddings are and how they
help us as SEO specialists
How to automate an internal link audit
and redirect mapping
The handles for building a vector
database and linking it to an LLM
01.
02.
03.