Trashcat's guide to black boxes: tech seo for LLMs

trashcats_guide_to _black_boxes Technical SEO Strategies for LLMs

audience_guide • Not a robot; speaks bot • Director of
Technical SEO at Cox Automotive • Author of Rich Snippets • Excitable nerd from New York • /in/jamie-indigo • not-a-robot.com • Proud pet parent

.boomer .tank

meet_keyleth Assumptions • Kitten • Domesticated • Lost pet (chipped)
• Innocent and affectionate

meet_keyleth Expert Evaluation • Adult • Stray • Ravenous •
Breed: shorthair domestic terrorist • Favorite toys: airpods and grasshoppers did do that. will do it again

new friends aren't always what they seem

geo?? aeo?? aio?? llmo?? don't care what you call it.
bots are made of code. docs pls

readme.lol shaping the battlefield is a PR and psyops tactic

is still a bot. am still tech seo hunt.exe

1. smart_people 2. log_files 3. bot_manager 4. analytics 5. interface
6. beg_borrow_build what_big_eyes

bot_covenant let me crawl you. I'll: 1. crawl politely 2.
declare who i am 3. send you traffic

declare who i am 3. send you traffic assumptions (Fool me once, Kiki)

1. promise(polite) RFC 9309 - Robots Exclusion Protocol RFC 9309
- Robots Exclusion Protocol; All about robots | Google Search Central Blog

Percentage of AI queries citing Reddit Credit: Josh Blyskal, LinkedIn
case study: crawl_politely

robots.txt reports that is lie Reddit disallows all crawlers after
$60M deal with Google Feb 24 Credit: Josh Blyskal, LinkedIn

chatgpt yeah im not doing that Credit: reddit.com/robots.txt via RRT
disallow: /airpods

fwd: fwd: fwd: plausible deniability Common Crawl's massive dataset is
more than 9.5 petabytes large and makes up a significant portion of the training data for many Large Language Models (LLMs) of GPT-3 tokens (a representation unit of text data) stemmed from Common Crawl. 80% Mozilla Report: How Common Crawl’s Data Infrastructure Shaped the Battle Royale over Generative AI

"65% of our most expensive traffic comes from bots" wikimedia
foundation How crawlers impact the operations of the Wikimedia projects meta name = "ravenous"

check(polite) AI Insights, Cloudflare Radar, 21 Aug - 18 Sep
2025

Skype is retiring in May 2025, Microsoft declare_name May 2025

declare_name May 2025

let me love you Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini[dot]google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Mariner; +https://developers.google[dot]com/search/docs/crawling-indexing/google-agent-mariner) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Search; +https://developers.google[dot]com/search/docs/crawling-indexing/google-agent-search) Chrome/114.0.0.0 Safari/537.36

send_traffic AI in Search is driving more queries and higher
quality clicks, Google Keyword Blog

Does LLM Traffic Convert Better Than Organic? A New Data-Backed
Study, Will Guevara but does it , liz?

"This data is in contrast to third-party reports that inaccurately
suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search."

"This data is in contrast to third-party reports that inaccurately
suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search." THEN HELP US LEARN BETTER

declare who i am 3. send you traffic do it whether you like it or not

ai_search ask ai for things, it 1. is search engine
2. does the thing I ask 3. uses search engine mechanics assumptions

search engine = information retrieval system ai search = model
trained on corpus + retrieval augmentation (sometimes)

AI "ranking" is probabilistic. Your brand has a tendency to
exist

AI rank tracking • uses synthetic personas as part of
the prompt • these are not the same as user embedding vectors • intentionally resets persistent memory AI search • is intent aware • has ambient persistent memory • is personalized using the same user embedding vectors but ai rank tracking…

analytics.jk 1. Analytics referrer 2. User-initiated UAs in log files
(ex: chatgpt-user) These are you appearing in AI search, my friend. 3. Text fragments in landing pages (#:~:text=)

Effective Resource restriction Robots.txt Introduction and Guide | Google Search
Central | Documentation 1. If you're disallowing it for an AI crawler, repeat the statement for CCbot 2. Use X-robots directives to block non-HTML resources from being indexed independently but allowed to contribute to rendered page content. 3. Seriously block your API endpoints for non rendering bots 4. Be sure to block only non-critical resources—that is, resources that aren't important to understanding the meaning of the page. 5. Block personalization resources used for returning users to conserve crawl budget 6. Hide logins from the crawl path

do_the_thing Simple experiment: 1. Ask for information on the page
2. Check logs for requests

just_let_ai_make_your_strategy_bro

polly want a bad idea Mark Williams-Cook, LinkedIn

nothing a I says is true it s just probable

render_mechanics AI Crawler Requests Renders Google (ecosystem) ✅ ✅ Claude
(Claude-SearchBot, Claude-User, Claude-Web, ClaudeBot) ✅ 🤔 OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot, ChatGPT Agent) ✅ ❌ Meta (Meta-ExternalAgent) ✅ ❌ Perplexity (PerplexityBot) ✅ ❌ ByteDance (Bytespider) ✅ ❌ Common Crawl (ccbot) ✅ ❌ Rise of the AI Crawler, Vercel; reverified 29 July 2025– thank you, Ryan Siddle

ai_index

“The decision on which pages to crawl is primarily influenced
by the relevance of the title, the content within the snippet, the freshness of the information, and the credibility of the domain.” ChatGPT support team How does ChatGPT Search select the sources to crawl? Jérôme Salomon

AI Chatbot Market Share Worldwide }am one squishy mortal. would
read dev docs tho

snippet_peekaboo

1. URL 2. Title 3. Snippet (usually meta description) 4.
Ranking position 5. Metadata event: delta data: {"v": [ { "type": "search_result_group", "domain": "www.kbb.com", "entries": [ { "type": "search_result", "url":"https://www.kbb.com/cars-for-sale/all/2025/nissan/frontier/pro-4x", "title": "2025 Nissan Frontier PRO-4X for Sale", "snippet": "... 3572 Nissan Frontier cars for sale, including a New 2025 Nissan Frontier PRO-4X and a Used 2025 Nissan Frontier PRO-4X ranging in price from $13795 to $68619.", "ref_id": null, "pub_date": null, "attribution": "www.kbb.com" } ] } ] }

but what do i do

leverage analytics and logs as a proxy for AI rank
1. Referring 2. User-initiated crawls 3. Fragments English Google SEO office-hours from January 7, 2022

want new pages crawled? link from homepage “So for the
most part, for example, we would refresh crawl the homepage, I don’t know, once a day, or every couple of hours, or something like that. And if we find new links on their home page then we’ll go off and crawl those with the discovery crawl as well. And because of that you will always see a mix of discover and refresh happening with regard to crawling. And you’ll see some baseline of crawling happening every day. But if we recognize that individual pages change very rarely, then we realize we don’t have to crawl them all the time.” English Google SEO office-hours from January 7, 2022

John Mueller, LinkedIn meta descriptions are hot again

Technical SEO for AI Search - SALT.agency® performance still matters
• Sites with CLS ≤ 0.1 recorded a 29.8% higher inclusion rate in generative summaries compared with sites above this threshold. • Pages delivering LCP ≤ 2.5 seconds were 1.47 times more likely to appear in AI outputs than slower pages. • Crawlers abandoned requests for 18% of pages larger than 1 MB of HTML, highlighting the need for lean markup. • TTFB under 200 ms correlated with a 22% increase in citation density, particularly when paired with robust caching strategies. • His study shows that performance improvements do more than enhance user experience. They directly increase the probability of being cited or surfaced by AI systems.

HTTP 499 status code indicates that the client closed the
connection before the server could respond. In this context, the client is a bot sent by ChatGPT, terminating the request due to delayed server responses. Real-time genAI search can't afford slow pages ChatGPT has no time for your slow pages ⏱ | Jérôme Salomon 499 = i give up

• crawlable links • semantic html • consistent urls •
not rendered, not discovered tech seo still applies

how to think like revolutionary tech w/ ancient stack

question original_query score assessment explanation suggestions What is the value
according to the Blue Book? blue book value 6 partially_answered The content mentions that Kelley Blue Book provides various values (Trade In Range and Private Party Value) but does not specify what the 'Blue Book Value' is or how it is determined in a clear manner. Clarify what the 'Blue Book Value' specifically refers to and how it is calculated, including examples of different types of values (trade-in vs. private party). How can I find out the value of my car? car value 9 fully_answered The content clearly explains that users can get their car's value by providing their VIN or license plate and that they will receive an email with the value within 24 hours. This is a direct and actionable answer. - How much is my car worth according to Kelley Blue Book? how much is my car worth 7 partially_answered The content implies that users can find out their car's worth through Kelley Blue Book but does not provide a direct answer or method to obtain that specific value without further context. Include a more explicit statement or example of how to find out the worth of a specific car using Kelley Blue Book. If our existing queries move to AI conversations, can we answer?

concept representation content evaluator - a deepcrawl skunkwork

concept representation improvements

AEO Chunk optimization tool

Uniqueness Optimization Tool

.you_made_me_do_this turning to hackers for ai transparency

https://not-a-robot.com/ai-transparency Each doc has: 1. Assertions (Rules, Constraints, and Stated
Facts) 2. Functionalities (The AI's Capabilities) 3. Testing strategies using Chrome devTools to verify

i really just want tech doc pls

Trashcat's guide to black boxes: tech seo for LLMs

Trashcat's guide to black boxes: tech seo for LLMs

More Decks by Jamie Indigo

Featured

Transcript