How AI ‘’reads’’ content for real (for newbies and not so newbies)

How AI ‘’reads’’ content for real (for newbies and not
so newbies) Gianluca Fiorelli www.iloveseo.net

What is an LLM? A Large Language Model (LLM) is
a beefed-up version of your phone’s autocomplete feature. Just as your phone predicts what you’ll type based on the words you’ve already typed, an LLM predicts and generates text based on the words and patterns it has learned from massive amounts of text (like books, websites, and articles).

How does an LLM work? An LLM chooses words to
use based on probability. It predicts the next word by calculating which word is most likely to appear based on patterns learned from all the texts it has been trained on. An LLM does this at scale and with a very high level of complexity.

How does an LLM work? It breaks down the sentence
into smaller parts (like words or even word fragments). It looks at the context (the entire sentence or the entire conversation) to understand what we are asking or saying. It calculates the probability of what the next word should be based on patterns learned from billions of sentences it has seen before.

Entity Search

Entity Search An entity is a specific person, place, thing,
or concept, and entities provide context to language. If an LLM cannot understand what entity we are referring to, then it will not use our content as a source to use for providing an accurate answer.

Entity Search When we ask a question, LLMs try to
understand what the key entities are and how they relate to each other.

Entity Search For example, if we ask, "What is the
weather like in Valencia, Spain?", LLMs will: • Recognize "Valencia" as an entity.

Entity Search For example, if we ask, "What's the weather
like in Valencia, Spain?", LLMs will: • Recognize "Valencia" as an entity. • Use context to understand that we are referring to the Spanish city, and not Valencia in California or Valencia oranges.

Entity Search For example, if we ask, “What is the
weather like in Valencia, Spain?” LLMs will: • Recognize “Valencia” as an entity. • Use context to understand that it refers to the Spanish city, not Valencia in California or Valencia oranges. • Understand that the term ‘weather’ is related to environmental conditions such as temperature and precipitation.

Entity Search For example, if we ask, “What’s the weather
like in Valencia, Spain?” LLMs will: • Recognize “Valencia” as an entity. • Use context to understand that it refers to the Spanish city, not Valencia in California or Valencia oranges. • Understand that “weather” is related to environmental conditions like temperature and precipitation. • It will do this by analyzing patterns across all the data it has been trained on. If it has seen “Valencia” in weather contexts more often in conjunction with “Spain” than other meanings, it will assign a higher probability to “Valencia” as the city in Spain than to the possible cases.

Entity Search - Context LLMs rely on the surrounding context
to get a clear understanding of entities in a text. “Apple sales are down” and the contextual content says, “iPhone and MacBook sales,” so “Apple” refers to the company. ’’I picked an apple from the tree,” so the context of the fruit tells the LLM that we literally mean “apple.” The LLM doesn’t “know” this like a human, but it calculates based on recurring patterns.

Entity Search <-> Knowledge Graph Valencia City in Spain Paella
Traditional dish of Best of Reviews Ratings

Entity Search – How to To ensure that our content
is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context.

is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data

is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority

Entity Search – How to – Topical authority PILLAR CLUSTER
CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT

is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority • Internal linking (with ’’semantic’’ anchors)

Entity Search – How to – Topical authority PILLAR CLUSTER
CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT

is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority • Internal linking (with ’’semantic’’ anchors) • Use ’’natural language’’

Entity Search – How to Pizza is very popular in
New York BETTER New York-style pizza is known for its thin crust and generous use of mozzarella and has been a popular dish in the Big Apple since it was introduced to the city by Italian immigrants in the early 20th century.

Embeddings

An embedding is a way of transforming words, phrases, and
entities into mathematical representations (essentially, a set of numbers) that capture their meaning and relationships. • "Cat" and "dog" would have close embeddings because they are both animals. • "Cat" and "lion" would be even closer because they are both members of the feline family. • "Cat" and "car" would be distant because they are not semantically related. Embeddings – What are they?

Dog Cat Lion Car Embeddings – What are they?

Entity Search relies heavily on embeddings because entities are not
just words; they represent concepts and relationships. • Amazon → Jungle → River • Amazon → Prime → The company Embeddings – What are they?

The context of the query determines from which embedding cluster
the LLM selects sources for their answers. Amazon (company) Ecommerce Logistics Amazon (river) South America Rain forest Embeddings – Why they matter

Attention Mechanisms

Attention Mechanisms – What are they? Attention Mechanisms allow a
language model to focus only on the most relevant parts of a text when generating or interpreting a sentence, just like a copywriter does when choosing the keywords to emphasize in a title. In practice, they help the model to “decide what matters most” among all the words, improving the coherence and meaning of the text produced.

1. ‘’The customer ordered a bouquet of flowers because it
was his wife’s birthday.’’ 2. Question: ‘’Why did he order the flowers?’’ 3. Attention Mechanisms → ‘’birthday’’ and ‘’wife’’ and not ‘’bouquet’’ or ‘’customer’’. Attention Mechanisms – What are they?

Attention Mechanisms – How to Use clear and direct syntactic
structures Short and well-structured sentences help the model to immediately understand the relationships between subject, verb and object. “We offer free shipping over €50" is better than "If you order a certain quantity you may have, in some cases, shipping included".

Attention Mechanisms – How to Put key information at the
beginning of the sentence or paragraph LLMs give more weight to the first words in narrow “contexts.” Start paragraphs with the main idea → "Our olive oil is 100% organic. It is produced in..."

Attention Mechanisms – How to Use explicit and unambiguous words
Let's avoid synonyms that are too vague or poetic if not functional: LLMs work better with precise terms. "Waterproof running shoes for winter" is better than "Footwear resistant to inclement weather".

Attention Mechanisms – HOW Strengthen thematic coherence using related entities
and words If we are talking about “organic wine”, we also use terms like “grape”, “organic certification”, “sulphite-free”, etc. in the text. Attention Mechanisms will connect them together. This improves interpretability and semantic indexing.

Attention Mechanisms – HOW Use descriptive titles, bullets and headings
(H1, H2…) Attention Mechanisms take advantage of the visual and semantic structure of the text. Breaking up and structure the text helps the model (and the user) to focus on the main concepts.

Cosine Similarity

Cosine Similarity – What is it? Cosine similarity is a
way that language models use to understand how much two or more texts “point in the same direction,” that is, how much they talk about similar things, even if they use different words. Imagine we have two sentences: • “Book your summer vacation in Sardinia now.” • “Offers for beach trips in Italy this summer.”

• “Book your summer vacation in Sardinia now.” • “Offers
for beach trips in Italy this summer.” Even though the two sentences are not identical, they are talking about the same thing: summer beach vacations in Italy. Cosine similarity measures how semantically close these two texts are, and returns a value between -1 (opposite) and 1 (identical). Cosine Similarity – What is it?

Cosine Similarity – Competitive analysis Cosine Similarity helps us understand
how semantically similar our content is to pages that already rank well on Google, and how it differs.

Cosine Similarity – Why With the documents it ranks, Google
is telling us what it considers relevant By analyzing the pages in the top 10 and calculating the cosine similarity between our content and theirs, we can understand how much your text is “consistent” with what Google considers useful for the targeted query.

Cosine Similarity – Attention! We will avoid two common mistakes:
1. Writing something too semantically distant → Google does not understand that you are answering the question. 2. Copying too much → your content is indistinguishable from others, so you have no reason to be rewarded.

Cosine Similarity – Why Let's find spaces to differentiate ourselves
If all the pages in the top 10 are very similar to each other, and we can cover the same intent with a different but semantically coherent cut, then, we will have a good chance of emerging in all the existing uniformity.

Cosine Similarity – Example If we are trying to rank
for “holidays in Sardinia with dog”, and the pages in the top 10 all talk about dog-friendly beaches, but we only talk about hotels, the cosine similarity will be low. If instead we integrate both themes (hotels + beaches but also transport + activities), our page will be semantically richer and closer to the center of the query.

Cosine Similarity – Example 1 → We upload the documents
we want to analyze 2 → We clearly indicate the context of the action to be carried out 3 → We clearly explain the task and its steps

Cosine Similarity – Example Context: I want to update the
content you see in the doc ”my article" so that it can be positioned in the top 10 for the search "vacanze in Sardegna con il cane". For this query, the pages whose main texts you can see in these other Docs are positioned in the top 3: ”article 1", ”article 2" and ”article 3". Task: I ask you to compare the relevance of the four texts to the query using cosine similarity. Once you have done the analysis, tell me what are the gaps at the text, entity and concept levels for which the content of ”my article" is less relevant than the others.

Cosine Similarity – Example 1 → Cosine Similarity score 2
→ Explanation of the data obtained

Cosine Similarity – Example Gap Analysis

Entity Salience

Entity Salience – What is it? Entity Salience measures how
central and relevant an entity (person, place, concept, etc.) is within a text, that is, how much the content "revolves around it".

Entity Salience – Example

Entity Salience – Example OK

Monosemanticity

‘’NEURONS’’

Monosemanticity – What is it? Monosemanticity is when a neuron
in the model fires only for a specific meaning, making the concept clearer and the model more interpretable.

Monosemanticity – Practical terms Let's avoid mixing too many meanings
in a single sentence Let's help the model's neurons to associate a sentence with only one concept at a time. NO “Barolo is a full-bodied, ageable red wine from Piedmont.” YES “Barolo is a famous, elegant, strong, refined, celebrated, Italian wine...”

Be specific, not general Models understand concepts better if we
use precise terms. “Croissant with custard” instead of “breakfast cake.” Avoid vague expressions like “something nice to eat in the morning.” Monosemanticity – Practical terms

Let's put the key information at the beginning (→ Attention
Mechanisms) Models give more weight to what comes first. “The Bernina Train is one of the most beautiful scenic journeys in Europe.” “Among the many possible travel experiences, some stand out for their beauty...”

Use clear and repeated entities consistently (→ Entity Salience) Let's
help the model understand what the text is really about. “Salento is known for its beaches, such as Punta Prosciutto and Torre Lapillo.” YES! “This region offers many beautiful locations for those who love the sea.” NO!

Maintain semantic coherence with the main topic (→ Cosine Similarity)
We use terms that are related and relevant to the query. In an article about family hotels, we include entertainment, children's menus, family rooms. Talking about luxury spas for couples takes us away from the semantic center.

Be specific, not generic (→ Monosemanticity) We help the model's
neurons to bind to precise concepts. “Croissant with custard” YES! “Breakfast dessert” NO!

Don't mix different concepts in the same sentence (→ Monosemanticity
+ Attention) Each sentence should convey a clear and single idea. YES: “Chianti is a Tuscan wine suitable for red meat.” NO: “Chianti is a wine, but also a territory, and it is popular both in Italy and abroad, especially during the summer.”

Content strategy

Start with the questions your audience is asking, and use
fan-out queries to uncover gaps in your content strategy

Are the implicit queries in the conversational searches we are
analyzing answered by the content on our website? Gap analysis with Query Fan-Out

If the answer is yes, and we are visible both
as a source of LLMs and in classic search results, great! Gap analysis with Query Fan-Out

If the answer is yes, but we are not used
as a source nor are we particularly visible in classic searches, then we will have to improve the quality and relevance of our content. Gap analysis with Query Fan-Out

If the answer is yes and we are positioning ourselves
in classic search results, but we are not used by LLMs as a source, then we will have to improve “factors” such as the semantic clarity of our content. Gap analysis with Query Fan-Out

If the answer is yes and we are used as
a source by LLMs, but we do not rank in classic search, this means that the only valid thing in our content is the chunk used by LLMs, but that everything else should be reviewed. Gap analysis with Query Fan-Out

If the answer is no, then we will need to
create content that can answer those implicit questions/needs along the entire potential search/customer journey. Gap analysis with Query Fan-Out

Content Chunks

Content Chunks – What are they? In the context of
AI Search, a “content chunk” is a self-contained block of content (usually a paragraph, section, or short answer) that can be understood and reused individually by an AI to answer a user’s question, without having to read the entire page.

Content Chunks – Example

Content Chunks – What to do If we have worked
well, that is, we have created our content without forgetting how LLMs “read” and “understand” the content… then we don’t have to do anything.

In other words, chunk optimization is dangerously similar to the
classic concept of “SEO copywriting”: nonsense. Content Chunks – What to do

If the content is written in a natural way, and
if we have made our own concepts such as, for example, clarity, univocity, semantic coherence, ’’inverted pyramid’’, etc., then the ‘’chunks’’ are already present and ‘’optimized’’ in our content. Content Chunks – What to do

Content Chunks – Useless concept? NO! Knowing that LLMs think
in chunks rather than entire pages can give us the ability to update existing content by adding a “chunk” instead of creating a completely new one.

Gianluca Fiorelli www.iloveseo.net How AI ‘’reads’’ content for real (for
newbies and not so newbies)

How AI ‘’reads’’ content for real (for newbies ...

How AI ‘’reads’’ content for real (for newbies and not so newbies)

More Decks by gianluca fiorelli

Other Decks in Marketing & SEO

Featured

Transcript