Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bots and Logs - SMX Munich 2026

Bots and Logs - SMX Munich 2026

Avatar for Amanda King

Amanda King

March 19, 2026
Tweet

More Decks by Amanda King

Other Decks in Marketing & SEO

Transcript

  1. Logs Kinda Lie, Bots Probably Do. Amanda King @ FLOQ

    Consulting / Topic Compass SMX Munich 11 Mar 2026
  2. What’s what 1. Current state of bots 2. Crawling things

    3. What we should do about it 4. Three takeaways 5. Who’s this human?
  3. So only about half of the traffic to your website

    is human…and not all of the bots are going to be filtered by Google Analytics
  4. Bad actors have easy access to scale with bot-enabled attacks,

    particularly API endpoints https://cpl.thalesgroup.com/sites/default/files/content/campaigns/badbot/2025-Bad-Bot-Report.pdf
  5. By far and away Bytespider (Tiktok) is the most leveraged

    for malicious attacks. Don’t ask me how. https://cpl.thalesgroup.com/sites/default/files/content/campaigns/badbot/2025-Bad-Bot-Report.pdf
  6. Training is still one of the most popular actions for

    LLM/AI bots https://radar.cloudflare.com/embed/AiBotTrafficXY?dateRange=52w&ref=%2Fai-insights
  7. Only 3% of the top 10K domains are disallowing GPTbot.

    ~40% are blocking at least one bot
  8. Less than 10% of businesses are reporting fully scaled use

    of agentic AI https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
  9. In the US, Most growth appears in mid-length queries, particularly

    6–9 word searches Searches 15+ words show more volatility than all other query lengths. https://datos.live/report/state-of-search-q4-2025/
  10. The content LLMs are trained on is not your website

    as you know it It's typically: • Stripped of HTML tags • Has boilerplate removed • Stray artefacts resolved • (Usually) does not render JavaScript • Is not current
  11. Plus LLM’s don’t always crawl the “live” version of your

    website…even when they tell you they are
  12. It’s about patterns and the implicit understanding of training data

    • Structured data and entity recognition is how you get on the shortlist to be a part of the RAG pipeline, or how you’re considered as a “good result” in the training data in the first place • When you codify how you talk about your brand and that’s consistent across channels, across media, across wherever you can control and influence, that’s a pattern LLM’s can recognise and interpolate
  13. We can understand (some) of it in Google Analytics Filter

    'Session source/medium' to match regex .*meta.ai.*|.*perplexity.*|.*claude.*|.*mistral.* |.*gemini.*|.*chatgpt.*|.*copilot.*|.*manus.*|.* huggingface.co.*|.*grok.*|.*deepseek.*|.*you .com.*|.*poe.*|.*character.ai.*
  14. But first - a question. How many folks have read

    their site logs? How recently?
  15. But then, how frequently do they access your site? •

    Access logs • Usually from your server, sometimes from your CDN (e.g. Cloudflare) • Probably a request to your development team, potentially your engineers if they sit differently • 3 months for a top-level • 6-12 months for a trend • Can use Screaming Frog to aggregate data (or build it yourself with Claude’s help)
  16. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  17. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  18. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  19. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  20. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  21. 20.194.157.187 - - [25/Feb/2026:00:19:45 +0000] "GET /url-slug HTTP/1.1" 200 176444

    "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
  22. Ways to identify non-explicit bots • Old old versions of

    Chrome (118/119/120) • Often won’t have a referrer, though many times a referrer isn’t included in logs anyway • Geo mismatch (Vietnam location, en-US lang) • Simple user agents naming, “python-requests/2.28” • Matches datacentre IPs rather than residential
  23. An actual user log 72.14.201.170 - - [25/Feb/2026:00:11:20 +0000] "GET

    /?srsltid=AfmBOornwJvajwsfkfwOXm-u9URnQ7UMhy SEX5D9X_PeK3tOliwzFlWP HTTP/1.1" 200 77195 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Mobile Safari/537.36"
  24. • Captcha validate humans? Buh-bye. • Don’t have a filter

    for the specific shade of grey your client asked for? Or it doesn’t have the right alt attribute to describe it properly? Abandon. • Page takes too long to load? Nope. • Can’t search easily for v-neck in the on-site search? Gone. • Exact travel dates not available? Next hotel. • Have a pointless discount pop-up that blocks the category page? Bye!
  25. “If CloudFlare, CloudFront, Datadome, or your own security stack allows

    crawling, but blocks conversion, update it now.” -Jes Scholz
  26. Make sure your critical path is clear • Avoid CAPTCHA

    • Are all pages a 200 response? • Do your forms work properly? • Accessible to a text browser? (e.g. AA/AAA) • Does your filtering work? Your on-site search? • Does your discount pop-up block the conversion path visually? • Remove auto redirects based on geo? • Do you allow guest checkout? Or force login early?
  27. So what does this mean? We have to step out

    of a purely “SEO” stance again, with: • Retention conversions • CRO initiatives • Product details and ontology • Accessibility • Brand building via digital footprint …and yes, stricter technical SEO and managing tech debt
  28. Google Analytics doesn’t tell the right story • Agents accept

    cookies, 3 of 4 times • Agents will probably be logged as direct traffic • Agents will likely be logged as desktop • Agents will be Chrome/Chromium • Engagement metrics will be..weird
  29. …Or you can lift your metrics up to share of

    voice and share of market rather than trying to pin this down into something absolute and fully quantifiable
  30. 3 takeaways To move forward with the new search experience

    in step with the business 1. LLM bots are here to stay, whether we want them to be or not 2. Bots are not forgiving to errors or conversion blockers 3. This is more than traditional SEO
  31. Amanda King is human • 15+ years in the SEO

    industry • Business- and product-focussed • AI & LLM forward strategies • Visited 40+ countries, lived in 3 • Always learning • Slightly obsessed with tea