Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How AI ‘’reads’’ content for real (for newbies ...

How AI ‘’reads’’ content for real (for newbies and not so newbies)

The topic: How #AI and #LLMs "read" content.

Principles and actionable ideas, and, yes, I also talk of Query Fan-Out and chunks (with a little polemic take about chunks).

As the title says, it is for newbies, but not just them.
While being precise, it is in plain English and has an actionable tone of voice.

Newbies will understand difficult topics more easily.
Not-so-newbies will understand that maybe they misunderstood one thing or two.

Avatar for gianluca fiorelli

gianluca fiorelli

July 02, 2025
Tweet

More Decks by gianluca fiorelli

Other Decks in Marketing & SEO

Transcript

  1. How AI ‘’reads’’ content for real (for newbies and not

    so newbies) Gianluca Fiorelli www.iloveseo.net
  2. What is an LLM? A Large Language Model (LLM) is

    a beefed-up version of your phone’s autocomplete feature. Just as your phone predicts what you’ll type based on the words you’ve already typed, an LLM predicts and generates text based on the words and patterns it has learned from massive amounts of text (like books, websites, and articles).
  3. How does an LLM work? An LLM chooses words to

    use based on probability. It predicts the next word by calculating which word is most likely to appear based on patterns learned from all the texts it has been trained on. An LLM does this at scale and with a very high level of complexity.
  4. How does an LLM work? It breaks down the sentence

    into smaller parts (like words or even word fragments). It looks at the context (the entire sentence or the entire conversation) to understand what we are asking or saying. It calculates the probability of what the next word should be based on patterns learned from billions of sentences it has seen before.
  5. Entity Search An entity is a specific person, place, thing,

    or concept, and entities provide context to language. If an LLM cannot understand what entity we are referring to, then it will not use our content as a source to use for providing an accurate answer.
  6. Entity Search When we ask a question, LLMs try to

    understand what the key entities are and how they relate to each other.
  7. Entity Search For example, if we ask, "What is the

    weather like in Valencia, Spain?", LLMs will: • Recognize "Valencia" as an entity.
  8. Entity Search For example, if we ask, "What's the weather

    like in Valencia, Spain?", LLMs will: • Recognize "Valencia" as an entity. • Use context to understand that we are referring to the Spanish city, and not Valencia in California or Valencia oranges.
  9. Entity Search For example, if we ask, “What is the

    weather like in Valencia, Spain?” LLMs will: • Recognize “Valencia” as an entity. • Use context to understand that it refers to the Spanish city, not Valencia in California or Valencia oranges. • Understand that the term ‘weather’ is related to environmental conditions such as temperature and precipitation.
  10. Entity Search For example, if we ask, “What’s the weather

    like in Valencia, Spain?” LLMs will: • Recognize “Valencia” as an entity. • Use context to understand that it refers to the Spanish city, not Valencia in California or Valencia oranges. • Understand that “weather” is related to environmental conditions like temperature and precipitation. • It will do this by analyzing patterns across all the data it has been trained on. If it has seen “Valencia” in weather contexts more often in conjunction with “Spain” than other meanings, it will assign a higher probability to “Valencia” as the city in Spain than to the possible cases.
  11. Entity Search - Context LLMs rely on the surrounding context

    to get a clear understanding of entities in a text. “Apple sales are down” and the contextual content says, “iPhone and MacBook sales,” so “Apple” refers to the company. ’’I picked an apple from the tree,” so the context of the fruit tells the LLM that we literally mean “apple.” The LLM doesn’t “know” this like a human, but it calculates based on recurring patterns.
  12. Entity Search <-> Knowledge Graph Valencia City in Spain Paella

    Traditional dish of Best of Reviews Ratings
  13. Entity Search – How to To ensure that our content

    is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context.
  14. Entity Search – How to To ensure that our content

    is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data
  15. Entity Search – How to To ensure that our content

    is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority
  16. Entity Search – How to – Topical authority PILLAR CLUSTER

    CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT
  17. Entity Search – How to To ensure that our content

    is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority • Internal linking (with ’’semantic’’ anchors)
  18. Entity Search – How to – Topical authority PILLAR CLUSTER

    CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT CLUSTER CONTENT
  19. Entity Search – How to To ensure that our content

    is recognized by LLMs and search engines, it is necessary to organize it in a way that highlights its entities and context. • Structured data • Topical authority • Internal linking (with ’’semantic’’ anchors) • Use ’’natural language’’
  20. Entity Search – How to Pizza is very popular in

    New York BETTER New York-style pizza is known for its thin crust and generous use of mozzarella and has been a popular dish in the Big Apple since it was introduced to the city by Italian immigrants in the early 20th century.
  21. An embedding is a way of transforming words, phrases, and

    entities into mathematical representations (essentially, a set of numbers) that capture their meaning and relationships. • "Cat" and "dog" would have close embeddings because they are both animals. • "Cat" and "lion" would be even closer because they are both members of the feline family. • "Cat" and "car" would be distant because they are not semantically related. Embeddings – What are they?
  22. Entity Search relies heavily on embeddings because entities are not

    just words; they represent concepts and relationships. • Amazon → Jungle → River • Amazon → Prime → The company Embeddings – What are they?
  23. The context of the query determines from which embedding cluster

    the LLM selects sources for their answers. Amazon (company) Ecommerce Logistics Amazon (river) South America Rain forest Embeddings – Why they matter
  24. Attention Mechanisms – What are they? Attention Mechanisms allow a

    language model to focus only on the most relevant parts of a text when generating or interpreting a sentence, just like a copywriter does when choosing the keywords to emphasize in a title. In practice, they help the model to “decide what matters most” among all the words, improving the coherence and meaning of the text produced.
  25. 1. ‘’The customer ordered a bouquet of flowers because it

    was his wife’s birthday.’’ 2. Question: ‘’Why did he order the flowers?’’ 3. Attention Mechanisms → ‘’birthday’’ and ‘’wife’’ and not ‘’bouquet’’ or ‘’customer’’. Attention Mechanisms – What are they?
  26. Attention Mechanisms – How to Use clear and direct syntactic

    structures Short and well-structured sentences help the model to immediately understand the relationships between subject, verb and object. “We offer free shipping over €50" is better than "If you order a certain quantity you may have, in some cases, shipping included".
  27. Attention Mechanisms – How to Put key information at the

    beginning of the sentence or paragraph LLMs give more weight to the first words in narrow “contexts.” Start paragraphs with the main idea → "Our olive oil is 100% organic. It is produced in..."
  28. Attention Mechanisms – How to Use explicit and unambiguous words

    Let's avoid synonyms that are too vague or poetic if not functional: LLMs work better with precise terms. "Waterproof running shoes for winter" is better than "Footwear resistant to inclement weather".
  29. Attention Mechanisms – HOW Strengthen thematic coherence using related entities

    and words If we are talking about “organic wine”, we also use terms like “grape”, “organic certification”, “sulphite-free”, etc. in the text. Attention Mechanisms will connect them together. This improves interpretability and semantic indexing.
  30. Attention Mechanisms – HOW Use descriptive titles, bullets and headings

    (H1, H2…) Attention Mechanisms take advantage of the visual and semantic structure of the text. Breaking up and structure the text helps the model (and the user) to focus on the main concepts.
  31. Cosine Similarity – What is it? Cosine similarity is a

    way that language models use to understand how much two or more texts “point in the same direction,” that is, how much they talk about similar things, even if they use different words. Imagine we have two sentences: • “Book your summer vacation in Sardinia now.” • “Offers for beach trips in Italy this summer.”
  32. • “Book your summer vacation in Sardinia now.” • “Offers

    for beach trips in Italy this summer.” Even though the two sentences are not identical, they are talking about the same thing: summer beach vacations in Italy. Cosine similarity measures how semantically close these two texts are, and returns a value between -1 (opposite) and 1 (identical). Cosine Similarity – What is it?
  33. Cosine Similarity – Competitive analysis Cosine Similarity helps us understand

    how semantically similar our content is to pages that already rank well on Google, and how it differs.
  34. Cosine Similarity – Why With the documents it ranks, Google

    is telling us what it considers relevant By analyzing the pages in the top 10 and calculating the cosine similarity between our content and theirs, we can understand how much your text is “consistent” with what Google considers useful for the targeted query.
  35. Cosine Similarity – Attention! We will avoid two common mistakes:

    1. Writing something too semantically distant → Google does not understand that you are answering the question. 2. Copying too much → your content is indistinguishable from others, so you have no reason to be rewarded.
  36. Cosine Similarity – Why Let's find spaces to differentiate ourselves

    If all the pages in the top 10 are very similar to each other, and we can cover the same intent with a different but semantically coherent cut, then, we will have a good chance of emerging in all the existing uniformity.
  37. Cosine Similarity – Example If we are trying to rank

    for “holidays in Sardinia with dog”, and the pages in the top 10 all talk about dog-friendly beaches, but we only talk about hotels, the cosine similarity will be low. If instead we integrate both themes (hotels + beaches but also transport + activities), our page will be semantically richer and closer to the center of the query.
  38. Cosine Similarity – Example 1 → We upload the documents

    we want to analyze 2 → We clearly indicate the context of the action to be carried out 3 → We clearly explain the task and its steps
  39. Cosine Similarity – Example Context: I want to update the

    content you see in the doc ”my article" so that it can be positioned in the top 10 for the search "vacanze in Sardegna con il cane". For this query, the pages whose main texts you can see in these other Docs are positioned in the top 3: ”article 1", ”article 2" and ”article 3". Task: I ask you to compare the relevance of the four texts to the query using cosine similarity. Once you have done the analysis, tell me what are the gaps at the text, entity and concept levels for which the content of ”my article" is less relevant than the others.
  40. Entity Salience – What is it? Entity Salience measures how

    central and relevant an entity (person, place, concept, etc.) is within a text, that is, how much the content "revolves around it".
  41. Monosemanticity – What is it? Monosemanticity is when a neuron

    in the model fires only for a specific meaning, making the concept clearer and the model more interpretable.
  42. Monosemanticity – Practical terms Let's avoid mixing too many meanings

    in a single sentence Let's help the model's neurons to associate a sentence with only one concept at a time. NO “Barolo is a full-bodied, ageable red wine from Piedmont.” YES “Barolo is a famous, elegant, strong, refined, celebrated, Italian wine...”
  43. Be specific, not general Models understand concepts better if we

    use precise terms. “Croissant with custard” instead of “breakfast cake.” Avoid vague expressions like “something nice to eat in the morning.” Monosemanticity – Practical terms
  44. Let's put the key information at the beginning (→ Attention

    Mechanisms) Models give more weight to what comes first. “The Bernina Train is one of the most beautiful scenic journeys in Europe.” “Among the many possible travel experiences, some stand out for their beauty...”
  45. Use clear and repeated entities consistently (→ Entity Salience) Let's

    help the model understand what the text is really about. “Salento is known for its beaches, such as Punta Prosciutto and Torre Lapillo.” YES! “This region offers many beautiful locations for those who love the sea.” NO!
  46. Maintain semantic coherence with the main topic (→ Cosine Similarity)

    We use terms that are related and relevant to the query. In an article about family hotels, we include entertainment, children's menus, family rooms. Talking about luxury spas for couples takes us away from the semantic center.
  47. Be specific, not generic (→ Monosemanticity) We help the model's

    neurons to bind to precise concepts. “Croissant with custard” YES! “Breakfast dessert” NO!
  48. Don't mix different concepts in the same sentence (→ Monosemanticity

    + Attention) Each sentence should convey a clear and single idea. YES: “Chianti is a Tuscan wine suitable for red meat.” NO: “Chianti is a wine, but also a territory, and it is popular both in Italy and abroad, especially during the summer.”
  49. Start with the questions your audience is asking, and use

    fan-out queries to uncover gaps in your content strategy
  50. Are the implicit queries in the conversational searches we are

    analyzing answered by the content on our website? Gap analysis with Query Fan-Out
  51. If the answer is yes, and we are visible both

    as a source of LLMs and in classic search results, great! Gap analysis with Query Fan-Out
  52. If the answer is yes, but we are not used

    as a source nor are we particularly visible in classic searches, then we will have to improve the quality and relevance of our content. Gap analysis with Query Fan-Out
  53. If the answer is yes and we are positioning ourselves

    in classic search results, but we are not used by LLMs as a source, then we will have to improve “factors” such as the semantic clarity of our content. Gap analysis with Query Fan-Out
  54. If the answer is yes and we are used as

    a source by LLMs, but we do not rank in classic search, this means that the only valid thing in our content is the chunk used by LLMs, but that everything else should be reviewed. Gap analysis with Query Fan-Out
  55. If the answer is no, then we will need to

    create content that can answer those implicit questions/needs along the entire potential search/customer journey. Gap analysis with Query Fan-Out
  56. Content Chunks – What are they? In the context of

    AI Search, a “content chunk” is a self-contained block of content (usually a paragraph, section, or short answer) that can be understood and reused individually by an AI to answer a user’s question, without having to read the entire page.
  57. Content Chunks – What to do If we have worked

    well, that is, we have created our content without forgetting how LLMs “read” and “understand” the content… then we don’t have to do anything.
  58. In other words, chunk optimization is dangerously similar to the

    classic concept of “SEO copywriting”: nonsense. Content Chunks – What to do
  59. If the content is written in a natural way, and

    if we have made our own concepts such as, for example, clarity, univocity, semantic coherence, ’’inverted pyramid’’, etc., then the ‘’chunks’’ are already present and ‘’optimized’’ in our content. Content Chunks – What to do
  60. Content Chunks – Useless concept? NO! Knowing that LLMs think

    in chunks rather than entire pages can give us the ability to update existing content by adding a “chunk” instead of creating a completely new one.