Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Michael King - Everything You MFs Should Know ...

Michael King - Everything You MFs Should Know About Query Fan Out

Avatar for Tech SEO Connect

Tech SEO Connect PRO

December 12, 2025
Tweet

More Decks by Tech SEO Connect

Other Decks in Marketing & SEO

Transcript

  1. EVERYTHING YOU MFS SHOULD KNOW ABOUT QUERY FAN OUT FROM

    THE GUY THAT MADE THE FIRST TOOL FOR QFO AND MADE IT OPEN SOURCE, BUT NONE OF THE MAJOR SEO SOFTWARE COMPANIES HAVE ONE YET, WTF. Michael King Founder and CEO iPullRank
  2. QFO | We’ve learned a lot of internal details about

    how Google works over the past two years and nothing meaningful in our popular software has changed. — Me, CEO, iPullRank
  3. CHANCE OF APPEARING IN AI SEARCH Proof: https://searchengineland. com/how-search-generat ive-experience-works-an

    d-why-retrieval-augmen ted-generation-is-our-fu ture-433393 IN OCTOBER 2023, I TOLD Y’ALL RAG WOULD BE OUR FUTURE
  4. TO SEE HOW IT WORKED I built Raggle as a

    RAG pipeline using AvesAPI (a SERP API), Llama Index, and the ChatGPT API. I BUILT A VERSION OF AI OVERVIEWS BEFORE IT WAS RELEASED
  5. DEPENDING ON WHAT SPACE YOU’RE IN I SAID CTR WOULD

    DROP 20-60% Based on a bunch of data we scraped for 90k keywords, we built a model to predict how much loss people would see from the presence AI Overviews.
  6. THE ACTUAL NUMBER IS 37.3% I PREDICTED NERDWALLET WOULD LOSE

    30.81% OF ITS TRAFFIC Based on data from Semrush, Nerdwallet’s traffic met my prediction in a couple months back.
  7. QFO | I drew two very wrong conclusions though. 1:

    I thought you could rank deep in the SERP and be a part of the AIO response.
  8. 2: I PREDICTED MULTIDIMENSIONAL SEARCHES, BUT I DIDN’T KNOW ABOUT

    QUERY FAN OUT… I said the large context windows would mean we’d have to have more comprehensive content to stay in the conversation. I didn’t know we’d need it to even be considered for the core response. QFO |
  9. HOW THE QUERY FAN-OUT TECHNIQUE WORKS All major LLMs ground

    their responses in through a process called the query fan-out technique, which expands the original search into multiple related, personalized, and contextual variations—all behind the scenes. This allows the lLM to pull content from across an entire category, gathering passages from multiple sources to generate a more complete answer.
  10. BECAUSE AI SEARCH IS LIKE A RAFFLE You don’t have

    control over the synthesis pipeline, but if you appear for as many of the inputs as possible, you are improving your chances of being the final choice. WHY DO I CARE SO MUCH ABOUT THIS?
  11. HOW REASONING WORKS IN GOOGLE LLMS Reasoning is the final

    stage in Google’s AI response process—where the system decides what to say and how to say it. After gathering content from many sources, Google uses large language models to weigh the relevance of each passage, compare them, and stitch together a cohesive answer. REASONING IS THE GATEKEEPER OF VISIBILITY
  12. CHANCE OF APPEARING IN AI SEARCH Nobody can run a

    business on a 25% chance. Clearly there must be something better. ZIPTIE’S DATA SAYS BEING IN THE TOP 10 GIVES YOU A 25%
  13. CHANCE OF APPEARING IN AI SEARCH So, if ranking in

    Organic Search doesn’t guarantee performance in AI Search, how is it JUST SEO?! PROFOUND’S DATA SAYS IT’S A 19%
  14. POSITIONS CONTRIBUTE EQUALLY Unlike traditional search where position #1 dominates

    clicks, LLM query fanouts show no position bias. Q1, Q2, and Q3 all show nearly identical citation overlap (~21–22%, p=0.54). In 51.6% of prompts, multiple queries tie for highest overlap. The model treats each query as an equal branch of exploration. THERE’S NO POSITION BIAS IN CHATGPT
  15. LLMs BUILD CONSENSUS ACROSS QUERIES When multiple queries find the

    same source, that's the model building confidence. 66–68% of what each query finds is also found by other queries. Sources found by all 3 queries (9.5% of citations) represent high-consensus, authoritative sources. AIM TO RANK FOR MULTIPLE QUERIES WITH ONE URL
  16. SERP SATURATION DRIVES CITATION PROBABILITY Domains appearing once have a

    9.7% citation rate; domains appearing 7+ times hit 80%+. AIM FOR YOUR DOMAIN TO RANK FOR MULTIPLE QUERIES
  17. DIMINISHING RETURNS AFTER QUERY 2 Each new query adds less

    new domain coverage because it overlaps more with what earlier queries already found. The first query covers 22.5%. Adding a second bumps total coverage to 31.7% (+9.2pp). Adding a third raises it to 38.7% (+7.0pp). MARGINAL GAINS AFTER TWO QUERIES
  18. UP TO 28 FAN OUT QUERIES PER PROMPT Source: https://www.seerinteracti

    ve.com/insights/gemini-3 -query-fan-outs-research AN AVERAGE OF 10.7 QUERIES PER PROMPT, BUT
  19. 28.3% OF CHATGPT-CITED PAGES DON’T RANK FOR ANYTHING IN ORGANIC

    SEARCH! Really, this is more likely a function of major tools not maintaining an index of queries with no search volume. Source: https://ahrefs.com/blog/chatgpts -most-cited-pages/ QFO |
  20. THE CORE PATTERN BEHIND AI SEARCH At the heart of

    most AI search platforms is retrieval-augmented generation. RAG addresses the fundamental weaknesses of large language models: hallucinations and knowledge cutoffs. By grounding generation in fresh, externally retrieved data, these systems can deliver answers that are both fluent and factual. I introduced the SEO community to this concept 2 years ago: https://searchengineland.com/how-searc h-generative-experience-works-and-wh y-retrieval-augmented-generation-is-ou r-future-433393 RETRIEVAL-AUGMENTED GENERATION (RAG)
  21. AI MODE USES MORE QUERIES AND SOURCES AI Overviews aim

    for brevity and clarity, so synthesis is constrained. Think of it as a single-shot generation pass with a fixed token budget. AI Mode, by contrast, is conversational and persistent. It can run additional retrieval cycles mid-session, incorporate follow-up questions, and adjust the synthesis style on the fly.
  22. BING ORCHESTRATOR IN PROMETHEUS “Prometheus leverages the power of Bing

    and GPT to generate a set of internal queries iteratively through a component called Bing Orchestrator, and aims to provide an accurate and rich answer for the user query within the given conversation context.” https://blogs.bing.com/search-quality -insights/february-2023/Building-th e-New-Bing AS KRISHNA EXPLAINED BING DOES A SIMILAR FAN OUT WITH
  23. HERE’S HOW AI SEARCH GENERATES RESPONSES Answer generation is a

    five step process from when the user submits a query/prompt and system returns a response. That process includes expansion, routing, retrieval, selection, and synthesis. THE ANSWER GENERATION PROCESS
  24. QUERY/PROMPT IS A STARTING POINT IN AI SEARCH In this

    new retrieval-and-synthesis pipeline, the query you type is not the query the system uses to gather information. Instead, the initial input is treated as a high-level prompt, a clue that sets off a much broader exploration of related questions and possible user needs. The system decomposes the query, rewrites it in multiple forms, generates speculative follow-ups, and routes each variant to different sources. QFO |
  25. QUERY EXPANSION AND LATENT INTENT MINING The journey from the

    user’s initial words to the system’s full set of retrieval instructions begins with query expansion. This is the stage where the system broadens the scope of what it is looking for, aiming to cover both the explicit and implicit needs behind the request. QFO |
  26. THE QUERY IS BROKEN DOWN INTO NEEDS This classification step

    informs everything that follows, because it sets constraints on the types of sources and content formats that will be considered. INTENT CLASSIFICATION
  27. VARIABLES THAT REPRESENT INFORMATION REQUIREMENTS Slots are variables the system

    expects to fill in order to produce a useful answer. Some slots are explicit. For example “half marathon” sets the distance, “beginners” sets the audience. Others are implicit. The system may want to know the available training timeframe, the runner’s current fitness level, age group, and goal (finish vs. personal record). These slots may not all be populated immediately, but knowing they exist allows the system to search for content that can fill them. SLOT IDENTIFICATION
  28. THE QUERY IS PLACED IN VECTOR SPACE TO IDENTIFY MORE

    This is where the original query is embedded into a high-dimensional vector space, and the model identifies neighboring concepts based on proximity. These are not random; they are informed by historical query co-occurrence data, clickstream patterns, and knowledge graph linkages. LATENT INTENT PROJECTION
  29. REWRITES AND DIVERSIFICATIONS The system then generates rewrites and diversifications

    of the original query. These might include narrowing variations (“12-week half marathon plan for beginners over 40”) or format variations (“printable beginner half marathon schedule”). Each rewrite is designed to maximize the chance of finding a relevant content chunk that might not match the original phrasing.
  30. SPECULATIVE SUB-QUESTIONS Finally, the model adds speculative sub-questions. These are

    based on patterns from similar sessions. Including these in the retrieval plan preemptively allows the system to gather the material it will likely need for synthesis. QFO |
  31. SUBQUERY ROUTING AND FAN-OUT MAPPING Once the expansion phase has

    produced a portfolio of sub-queries, the system’s job shifts from what to look for to where to look for it. This is the routing stage, and it is where the fan-out map becomes operational. Each sub-query is now a small task in its own right, and the system must decide which source or sources can best satisfy it, which modality is most appropriate for the answer, and which retrieval strategy will be used to get it.
  32. MAPPING SUB-QUERIES TO SOURCES In routing, the system maintains an

    internal mapping of which source types are most appropriate for different query classes. A “plan” might map to long-form text and structured schedules; a “checklist” might map to listicles and product tables; a “routine” might map to video; a “definition” might map to knowledge bases. QFO |
  33. RETRIEVAL STRATEGIES AND COST BUDGETING Routing involves both choosing how

    to retrieve information and managing the cost of retrieval. Retrieval method selection: Some sub-queries work best with sparse retrieval (e.g. BM25) for exact term matches, while others benefit from dense retrieval (embeddings) for semantic similarity. Hybrid approaches combine both to leverage their strengths. Cost-aware budgeting: Each retrieval call uses resources, so systems allocate retrieval effort based on sub-query importance—giving high-priority queries more retrieval passes from multiple sources, and low-priority ones fewer or cheaper calls. This is crucial when using paid APIs or other costly retrieval sources.
  34. WHERE OPPORTUNITIES ARE WON OR LOST Content must match the

    expected modality or it won’t be retrieved Ensure multi-modal parity (text, structured data, transcripts, etc.) Place content where the routing logic looks (e.g. API-friendly formats, transcripts for procedural content) Align content with routing profiles to increase retrieval across fan-out branches MATCHING THE ROUTING PROFILE
  35. QFO QUERIES ARE IN THE METADATA Logged in: https://chatgpt.com/backend-api/f/ conversation

    Logged out: https://chatgpt.com/backend-anon/ f/conversation Use Playwright with a stealth plugin or use a bookmarklet.
  36. SAME ENDPOINT SHOWS YOU THE RESULTS USED If you want

    to see what results it used to inform the response, you can also extract that from the conversation endpoint. You can see the domain, URL, title, and snippet in the response.
  37. SEE WHAT GEMINI USES FOR GROUNDING Ping the Gemini API

    with the googleSearch() function enabled and see what it returns. If it returns anything, then it’s a grounded response. Spoiler alert: It grounds most prompts as this point.
  38. EXPAND KEYWORDS WITH QFORIA THIS HELPS YOU UNDERSTAND YOUR CONTENT

    GAPS Qforia extrapolates synthetic queries based on the initial prompt and gives you their type and reasoning similar to what Google is doing. 53 https://ipullrank.com/tools/qforia
  39. QFORIA DATA NOT “REAL ENOUGH” FOR YOU? REVERSE INTERSECT THE

    CITATIONS Pull the AIO citation URLs and find out what they rank for using the Semrush API. Then intersect those terms to figure out what the “real” fan-out queries are. We use FetchSERP as the SERP API for this. 54
  40. I READ A COUPLE PATENTS When I was researching how

    AI Mode works, I read the one to the left (https://patents.google.com/patent/ WO2024064249A1/en) and the “Query response using a custom corpus” patent and realized there are 7 synthetic query types contemplated: Related, Implicit, Comparative, Recent, Personalized, Reformulation, and Entity-Expanded.
  41. IT’S REALLY JUST THREE PROMPT CATS IN A TRENCH COAT

    It’s somewhat shocking that the major tools haven’t built this on top of their bigger datasets.
  42. BASED ON THE CHANNEL The user inputs whether it’s AI

    Overviews or AI Mode, determines the number of prompts to generate, and gives a reason why it chose that number. FIRST WE DETERMINE HOW MANY PROMPTS WE WANT
  43. GENERATES A VARIETY OF QUERIES It generates queries of 6

    types and gives reasons why. It specifically does not account for geolocation and does not attempt to account for user history. Then it returns the data as JSON. THEN IT SIMULATES THE QUERY FAN OUT AND
  44. QFORIA HAS BEEN QUIETLY UPDATED IT DETERMINES THE EXPECTED TYPES

    OF CONTENT PER TERM The concept of routing content is now used as part of the pipeline so it tells you the type of content Gemini would consider the best match for that subquery. 60
  45. CONTENT ROUTING COMES FROM PRE-SELECTED FORMATS Google’s fan out process

    identifies types of content that each query would ideally have. This is the list that it chooses from.
  46. THERE A GOOD AMOUNT OF FORKS To date, 48 Github

    users have directly forked my code. There’s no telling who just copy and pasted it and applied to whatever.
  47. THERE’S FOLKS LIKE OTTERLY THAT JUST SLAPPED MY CODE IN

    AN KEPT IT MOVING This is literally my code with a couple of extra buttons and different styling. 64
  48. THEN THERE’S PEOPLE LIKE TYLER THAT MADE A MUCH MORE

    THOUGHTFUL AND ACTIONABLE FORK Tyler took it further with the comparison against the SERP, some cool visualizations, and some actionable advice on how to optimize your content. 65
  49. QFO | Since the major tools aren’t moving fast enough,

    I guess I have to give you an upgrade. — Me, CEO, iPullRank
  50. HERE ARE THE QFORIA UPGRADES THANK ME LATER 69 I’ve

    added: Gemini grounding reverse intersect function ChatGPT scraping some new charts.
  51. CHUNK DALY Unfortunately, there is still no public index of

    embeddings, but there is value in understanding how chunks across your site perform, so you can upload the keyword and ranking URL and get a chunk explorer.
  52. QFO | I’ll keep building and open sourcing until the

    major players give us what we need. — Me, CEO, iPullRank
  53. PROFOUND HAS THE MOST ROBUST CHATGPT QFO TRACKING THAT I

    HAVE COME ACROSS They provide the fan out queries with metrics on the variants and the share of queries. 76
  54. PROFOUND SHOWS THE QUERY DIFFS OVER TIME You can toggle

    between the latest version of the queries and what they looked like at certain periods in time. 77
  55. PROFOUND ALSO SHOWS HOW MANY SYNTHETIC QUERIES GET EXECUTED DAY

    OVER DAY I’ve only seen these show up for ChatGPT and Meta AI, but it further reinforces how probabilistic this is. 78
  56. MARKETBREW HAS A GOOGLE FOCUSED QFO TOOL THAT HAS SIMILAR

    FUNCTIONALITY TO QFORIA MarketBrew has a Content Booster tool that is what an SEO content editing tool should be. They both generate synthetic queries with Gemini and reverse intersect the rankings. 79
  57. MARKETBREW INTEGRATES THE QFO DATA IN CONTENT BOOSTER YOU CAN

    GENERATE CONTENT TO CLOSE THE GAP Content Booster allows you to automate the generation of content to close the semantic gaps between your content and your competitors based on the query fan out data. 80
  58. 5 STEPS TO UNDERSTANDING THE GAPS IN CONTENT Relevance Engineering

    is the process of adjusting content so it’s selected and cited by Google’s AI systems like AI Overviews and AI Mode. Unlike traditional SEO, which focuses on ranking full pages, this approach targets the individual passages and concepts that AI uses to construct responses TACTICAL IMPLEMENTATION
  59. OMNIMEDIA CONTENT STRATEGIES It requires us to think beyond text

    and align with the expected content formats and locations to drive visibility. Build a content ecosystem.
  60. ENTITY-RICH, EMBEDDING-FRIENDL Y LANGUAGE Write with clearly defined entities Use

    consistent terminology Include modifiers and descriptors: Qualifiers like size, function, location, and purpose help differentiate similar entities. QFO |
  61. STRUCTURED DATA I’m not here to argue with you how

    and when it’s used. Just know that LLMs can and do use structured data as part of RAG pipelines.
  62. ON-PAGE ELEMENTS CHECKLIST SEO BASICS ❏ Heading hierarchy ❏ Clean,

    semantic content ❏ Open Robots.txt ❏ Topical Clustering 87 Yes, a lot of the SEO basic best practices apply here.
  63. WRITING FOR SYNTHESIS To ensure your content performs well in

    modern retrieval systems, it’s essential to structure it in a way that is both machine-readable and human-friendly. Embedding models rely on clean, well-defined “chunks” or semantic units of information to generate precise and relevant results. QFO |
  64. GOOGLE’S DISCOVERY ENGINE API REVEALS HOW CHUNKING AND HEADERS CAN

    BE USED Metehan figured out from testing Google’s Discovery Engine API that chunks can be up to 500 tokens and can include the preceding header hierarchy to inform topical context. Source: https://metehan.ai/blog/reverse-engineerin g-google-ai-mode/ 89
  65. RELEVANCE SCORE PASSAGES USE RELEVANCE DOCTOR We built a simple

    tool that scores passages of content in a layout aware format. This will improve your ability to be considered and extracted. 90 https://ipullrank.com/tools/relevance-doctor
  66. PASSAGE OPTIMIZATION Optimizing for extractability means that content should be

    organized into easily defined sections. Headings and subheadings should be clear, and passages should answer queries directly and succinctly. The combination of query/passage is defined as a semantic unit, and these units are used to power AI search. QFO |
  67. Semantic Triples Semantic triples help search engines understand context better

    by identifying entities, establishing connections, and building a web of interconnected concepts, which provide richer contextual information beyond just keywords. These Subject–Predicate–Object triples are the building blocks of knowledge graphs, which allow AI systems to understand relationships between entities, enabling more intelligent search results, factual verification, and structured data for AI overviews.
  68. PROVIDE UNIQUE, HIGHLY SPECIFIC, OR EXCLUSIVE INSIGHTS Unique content or

    proprietary data increases the likelihood that your page is retrieved and cited as authoritative in RAG pipelines.
  69. TOOLS LIKE MARKETBREW ALSO HELP YOU DO THIS We have

    an AI Overview simulator that runs a RAG pipeline and takes updated content to see what would be returned. It also makes recommendations on what to do in order to improve the retrievability. With so much being open source, there is no reason that all SEO software companies cannot offer something similar. MarketBrew is a company that focuses on this idea across SEO generally. SIMULATE YOUR RANKINGS BASED ON CHANGES
  70. IN THE SEO TOOLS SPACE We have an AI Overview

    simulator that runs a RAG pipeline and takes updated content to see what would be returned. It also makes recommendations on what to do in order to improve the retrievability. With so much being open source, there is no reason that all SEO software companies cannot offer something similar. MarketBrew is a company that focuses on this idea across SEO generally. N8N IS A MUST HAVE TO CLOSE THE GAPS
  71. I prefer open source options so I can customize them

    to my needs and I don’t have to worry about data privacy. — Me, CEO, iPullRank No Code, All Flow |
  72. OLLAMA It’s an open source platform and API for running

    open source models. I use it for pretty much everything y’all use the closed models for unless I need fidelity with Google. https://ollama.com No Code, All Flow |
  73. Ollama Has a Chat Interface It features a web search

    mode to perform RAG and give you similar responses to the major chatbots. Much of what you can do in major chatbots, but Open Source
  74. It Downloads Models Automatically It features a web search mode

    as well. If you don’t have the model you select, it will go get it for you
  75. It’s Fast If You Have a Decent GPU It features

    a web search mode as well. Model size matters too, but you can do most with mid-sized models
  76. I generate embeddings and qualitatively assess content, and as I

    crawl all on my own GPU Screaming Frog + Ollama Generative features in crawl data for free
  77. I RUN N8N OPEN SOURCE TOO It’s really easy to

    get up and running with n8n locally or on your own hosted environment. Just install NodeJS and run npx n8n. No Code, All Flow | npx n8n n8n start
  78. N8N HAS 1121 INTEGRATIONS I generate embeddings as I crawl,

    asset content, take screenshots and analyze as I crawl. All on my own local GPU.
  79. You can have the n8n act as an API that

    you can easily integrate with Google Workspace or any other application. You Can Build a Full Microservice Use n8n to operate a backend
  80. FLEXIBLE HUMAN IN THE LOOP IS ONE OF N8N’S BEST

    FEATURES You can set up workflows to ping you in a variety of places and only move forward based on that.
  81. USE THE TEMPLATES LOCALLY The templates available at https://n8n.io/workflows are

    imported into my local instance from a single click. Use them as a place to start with building your own workflows.
  82. Automated AIO Analysis This extracts AIO details and generates recommendations

    on how to improve your content to appear in AIOs. https://n8n.io/workflows/4822-extract-and-analyze-google-ai-overviews-with-llm-for-seo-recommendations/
  83. AI WORKFLOW BUILDER FEATURE MAKES IT EASY N8N has a

    chat-driven workflow builder feature that allows you to describe the workflow you want and it configures it end to end. QFO
  84. AND BUILD FULLY FUNCTIONAL AGENTS FROM CHAT Crew is available

    as a cloud-based application or open source. You can code your agents from scratch or describe what you want in the chat interface and quickly develop a series of agents to help you scale you work. https://www.crewai.com IF YOU’RE CONSIDERING AGENTS CHECK OUT CREW.AI
  85. REMEMBER THESE FIVE THINGS If You Don’t Remember Anything Else…

    Learn how the systems work so you can discover new opportunities It will take more than SEO to get you visibility in the future Search technology and behavior has changed irrevocably. Most of your SEO tools will not help you get where you need to go. This is an opportunity to define the future.
  86. 115

  87. SCREAMING FROG + OLLAMA I generate embeddings as I crawl,

    asset content, take screenshots and analyze as I crawl. All on my own local GPU.
  88. 20 CHAPTERS OF PURE 🔥🔥🔥 Everything you need to know

    about how AI Search works. No vagueries.
  89. Mike King Founder / CEO @iPullRank [email protected] Tap in with

    us: ipullrank.com THANK YOU | Q&A Award Winning, #GirlDad iPullRank has been proudly featured in: