Slide 1

Slide 1 text

The AI Tech SEO Compendium Bastian Grimm, Peak Ace AG | @basgr Augmenting technical SEO tasks using ML & AI

Slide 2

Slide 2 text

What's the most tedious task you‘d like to automate?

Slide 3

Slide 3 text

Start small, validate initial ideas & concepts, then strategically scale up where feasible.

Slide 4

Slide 4 text

4 peakace.agency Winning with proven AI use cases (in marketing) Source: https://pa.ag/4bLPplv 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 Up to 95% accuracy automating answers 𝗦𝘂𝗺𝗺𝗮𝗿is𝗮𝘁𝗶𝗼𝗻 Up to 40% productivity gains in front and back office functions 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 Up to 40% cost savings in content creation 𝗡𝗮𝗺𝗲𝗱 E𝗻𝘁𝗶𝘁𝘆 R𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻 Up to 90% reduction of text reading and analysis work 𝗜𝗻𝘀𝗶𝗴𝗵𝘁 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 Up to 80% faster in processing data 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 Up to 30% cycle time reduction in customer service support

Slide 5

Slide 5 text

This deck is fairly technical… (so don‘t complain afterwards!) https://pa.ag/smx24

Slide 6

Slide 6 text

Common level of knowledge on some AI topics Before we dive straight in

Slide 7

Slide 7 text

What are Large Language Models (LLMs)?

Slide 8

Slide 8 text

Large Language Models (LLMs) are AI systems trained on vast datasets (thus “large”) to understand, predict and generate data using transformer-based neural networks. Simply put:

Slide 9

Slide 9 text

9 peakace.agency A comprehensive overview of Large Language Models And these are just some of the "bigger“ noteworthy LLMs being released until the end of 2023: Source: https://pa.ag/4cdB55B

Slide 10

Slide 10 text

What are LLMs good at?

Slide 11

Slide 11 text

11 peakace.agency Information Retrieval and Analysis LLMs can sift through large volumes of text data to extract relevant information, summarise key points, and answer questions, making them valuable for research, data analysis, and decision-making support. Personalised Recommendations LLMs can analyse user preferences and behaviour to provide personalised recommendations, such as articles or products, thus enhancing UX and engagement. Natural Language Processing LLMs excel in understanding language, making them ideal for applications such as chat bots, language translation, sentiment analysis, and text summarisation. What are LLMs good at?

Slide 12

Slide 12 text

What are LLMs NOT good at?

Slide 13

Slide 13 text

13 peakace.agency Understanding Context Beyond Training Data LLMs may not perform well in situations requiring an understanding of context or knowledge beyond their original training data set. Making Ethical or Moral Judgments LLMs lack the ability to make ethical or moral judgments and should not be used in situations where such considerations are crucial. Most LLMs’ decisions are also biased. Limited Understanding and Reasoning LLMs can't form a chain of logical conclusions, instead they’re following probability rules; even if the most common answer to a question is irrational or outright wrong, it will still provide said answer. What are LLMs NOT good at?

Slide 14

Slide 14 text

14 peakace.agency LLMs are also not good at creating original content LLMs don’t “write” anything. They generate text based on probabilities and the number of parameters used in their training, using content they've encountered before.

Slide 15

Slide 15 text

LLM Deep Dive

Slide 16

Slide 16 text

16 peakace.agency The "most popular" available LLM (interface) right now: Source: https://pa.ag/3AsVkun

Slide 17

Slide 17 text

Who is NOT using ChatGPT Plus?

Slide 18

Slide 18 text

… I'm not going to bother you with more prompts for ChatGPT and how to speed up your SEO. I mean everyone's doing that by now anyway, right? Don‘t you worry…

Slide 19

Slide 19 text

19 peakace.agency There are tons of commercial AI solutions available From ChatGPT, Azure AI to NVIDIA‘s AI Platform and IBM watsonx – everyone in big tech is offering "something“:

Slide 20

Slide 20 text

What about our friends over at Google?

Slide 21

Slide 21 text

21 peakace.agency How Google’s February ’24 went – in a nutshell Source: https://pa.ag/3uQ9dU1

Slide 22

Slide 22 text

22 peakace.agency ICYMI: OpenAI announced Sora Sora is currently only accessible for red team members – experts in areas such as misinformation, hateful content, and bias – to examine critical areas for potential problems or risks, however the preview is quite impressive: Sources: https://pa.ag/3IcBJm3 & https://pa.ag/4a7V2cb & https://pa.ag/3V1V2pw The excitement from the press has been reminiscent of the buzz surrounding the image creator DALL-E or ChatGPT: Sora is described as “eye-popping,” “world- changing,” and “breath- taking, yet terrifying.”

Slide 23

Slide 23 text

23 peakace.agency Back to Google: say goodbye to Bard and hello to Gemini – Google’s AI chat bot gets a new name

Slide 24

Slide 24 text

24 peakace.agency There's more! Gemini is a family of multimodal LLMs developed by Google DeepMind Multimodality Input/output using multiple formats (e.g., text, audio, video, gestures, etc.) Reinforcement learning Drastically reduce hallucinations 3rd party integrations High efficiency when using external tools and API integrations Memory capabilities Build and expand the knowledge bank while the model learns

Slide 25

Slide 25 text

25 peakace.agency Unsurprising to see this after the Hugging Face deal? Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology Google used to create the Gemini models: Sources: https://pa.ag/3T9q0cK & https://pa.ag/4akEihZ

Slide 26

Slide 26 text

26 peakace.agency Also, a variety of (free) open-source models are available Hugging Face’s Open LLM Leaderboard aims to track, rank and evaluate open source LLMs on different benchmarks: Source: https://pa.ag/3L2qUEV

Slide 27

Slide 27 text

Despite not being quite as powerful (yet), they are available to download, customise and self-host. The beauty of these?

Slide 28

Slide 28 text

28 peakace.agency But where to start, and which LLM to use? From LLaMa 2, Falcon to Dolly 2.0 and MPT or Bloom – the choice is yours (yep, I know… overwhelming much?) Source: https://pa.ag/3Td5ucz LLaMa 2 A well-performing open source LLM (with license for commercial use) that encompasses pre-trained and fine-tuned generative text models with 7 to 70 billion parameters. Vicuna & Alpaca Use the LLaMa model as basis and (like Google’s Bard and OpenAI’s ChatGPT) are fine-tuned to follow instructions. Vicuna matches GPT-4's performance. Falcon LLM Can be used with chat bots to generate text, solve complex problems and reduce and automate repetitive tasks. Falcon 6B & 40B are available as raw models for fine-tuning.

Slide 29

Slide 29 text

29 peakace.agency How to host and run your private LLM? Easy… let’s just ask ChatGPT how to do it, shall we?

Slide 30

Slide 30 text

30 peakace.agency LM Studio: Discover, download, and run local LLMs Run an AI on your desktop using locally installed open-source Large Language Models (LLMs) for free! Source: https://pa.ag/3UW0Dh7 With LM Studio, you can... ▪ Run LLMs on your laptop, even while offline (Win, Mac & Linux) ▪ Use models through the in-app UI or an OpenAI-compatible local server ▪ Download any compatible model files from Hugging Face repositories ▪ Discover new LLMs in the app

Slide 31

Slide 31 text

31 peakace.agency My favourite: Ollama – get LLMs up and running, locally Command line only. Use the PageAssist Chrome plug-in (a web UI for local LLMs) to control Ollama, including model pulls, configuration, and running LLM dialogues/chats. Sources: https://pa.ag/48A07se & https://pa.ag/48xAnNn Pro tip: Ollama runs at 127.0.0.1:11434 by default (and offers APIs as well)

Slide 32

Slide 32 text

32 peakace.agency Want to try for yourself, but you’re not a developer? Solutions such as stack or LLMStack offer no-code DIY approaches by connecting and combining a variety of data sources through APIs and other endpoints including LLMs. Sources: https://pa.ag/3wu1UlA & https://pa.ag/3UXzY3K

Slide 33

Slide 33 text

33 peakace.agency Peak Ace’s current favourites: balancing speed & scalability A small selection of platforms that we feel are most convenient to start with. If you’d like to chat about it – come meet the Peak Ace team outside at our SMX booth! More complex, but worth checking out if you’re into this stuff: Hugging Face LLM Inference Container for Amazon SageMaker

Slide 34

Slide 34 text

34 peakace.agency Keep in mind: There are risks that need to be managed (Obviously, this is true for both commercial and open-source models) Source: https://pa.ag/3Td5ucz Consent Ensuring training data was gathered with accountability, meaning it follows AI governance processes (compliant w/ laws & regulations) Security Security problems can include data leaks or cyber criminals using the LLM for a variety of malicious tasks Bias Happens when the data source is not diverse or representative enough Hallucinations Can result from the LLM being trained on incomplete, contradictory, or inaccurate data, or from predictions in general

Slide 35

Slide 35 text

35 peakace.agency Will hallucinations ever disappear? "It’s inherent in the mismatch between the technology and the proposed use cases," says Emily Bender, professor in the Department of Linguistics and director of the Computational Linguistics Laboratory at the University of Washington. Source: https://pa.ag/3PqP0Mh LLMs are designed to predict the next word – of course there will be cases where the model is wrong.

Slide 36

Slide 36 text

36 peakace.agency Source: https://pa.ag/3MMah0X Life or Death: AI-generated mushroom foraging books are all over Amazon; Experts are worried that books produced by ChatGPT […] which target beginner foragers, could end up killing someone. Terry Pratchett peakace.agency This can REALLY go wrong…

Slide 37

Slide 37 text

Retrieval-Augmented Generation We also need to talk RAG

Slide 38

Slide 38 text

Simply put, RAG integrates LLMs with external databases or APIs, thus enabling real-time information retrieval for up-to-date and more accurate responses. So, what‘s the deal?

Slide 39

Slide 39 text

RAG “fixes“ the issue with outdated information due to training data cut-off.

Slide 40

Slide 40 text

40 peakace.agency The conceptual flow of using RAG with an LLM RAG can be used to enhance the accuracy and reliability of gen AI models with facts fetched from external sources. Source: https://pa.ag/3STryHY Generated text response 5 Large Language Model EndPoint Prompt + Query + Enhanced context 4 Prompt + Query 1 Knowledge sources Search relevant information Query 2 Relevant Information for enhanced context 3

Slide 41

Slide 41 text

41 peakace.agency Why is RAG better/more efficient than other approaches? Because it can handle noisy or irrelevant information, refrain from answering when there is insufficient knowledge and integrate with a variety of different sources simultaneously. Source: https://pa.ag/3TdUgoc

Slide 42

Slide 42 text

42 peakace.agency The Self-RAG framework enhances LLM quality & factuality Self-RAG improves the output quality of LLMs by integrating retrieval, generation, and self-critique mechanisms. Source: https://pa.ag/3QP6MZ4 Self-RAG’s approach is to selectively retrieve relevant information and critique both the retrieved content and its own outputs, offering a more refined performance across various tasks compared to existing models.

Slide 43

Slide 43 text

43 peakace.agency Some real-world RAG use cases we’ve built in recent months Some of the most common cases we’ve seen and worked on for our clients over the last months: Chatbot Use RAG to incorporate LLMs into Q&A chat bots allowing for more accurate answers based on data from company documents. Knowledge engine Ask questions based on your data to provide context for LLMs and greatly increase the quality and accuracy of answers. Search augmentation Incorporate LLMs into onsite search (engines) and augmenting the results with LLM-generated answers/content leading to higher quality results.

Slide 44

Slide 44 text

If you tie it all together (tools, databases, APIs, models, etc.) you can build some REALLY cool stuff!

Slide 45

Slide 45 text

Let‘s talk about some typical tech SEO challenges: redirects

Slide 46

Slide 46 text

But mainly because you’re doing it wrong… Redirect mapping at scale is a tedious, error-prone task

Slide 47

Slide 47 text

47 peakace.agency Collect URL inventory with a crawling tool… ... and then somehow (usually manually) align it with the new target structure (depending on the respective content) and generate the redirect mapping file from this. …or any other crawler of choice. …or Google Sheets. Creating 1-to-1 redirect mappings for old content is often done in Excel. Then attempts are made to manually assign titles, headings or URLs.

Slide 48

Slide 48 text

For example, outdated categories are often "blindly" redirected to parent categories or even the homepage.

Slide 49

Slide 49 text

Yeah, horrifying indeed…

Slide 50

Slide 50 text

AI makes this process significantly faster and more efficient

Slide 51

Slide 51 text

51 peakace.agency Embeddings and vector database = redirect win Necessary steps for better automated redirects (and an improved customer journey): Extract main content of every (old) site/URL Generate embeddings Save together with metadata in vector database Semantic search in vector DB based on embeddings of old URLs

Slide 52

Slide 52 text

52 peakace.agency And our SEOs be like...

Slide 53

Slide 53 text

53 peakace.agency I got 99 problems but AI ain’t one…! (at least for now) Grab one outside, expo hall, booth #1 (ground floor) – see you there!

Slide 54

Slide 54 text

Word embeddings are numerical vectors representing words, capturing their meanings and relationships in a multidimensional space. What are (word) embeddings?

Slide 55

Slide 55 text

You can convert any word into a vector and start calculating with them: "king" minus "man" plus "woman" equals "queen". Synonyms and more can also be found this way. What are (word) embeddings?

Slide 56

Slide 56 text

A vector DB utilises data embeddings as index, facilitating fast and scalable searches among unstructured data points, enhancing efficiency in retrieving similar items or information. What about vector databases?

Slide 57

Slide 57 text

A vector DB allows you to find matches between anything and anything (e.g., use an image as a query to find similar pieces of text, video, other images, etc.). Simply put:

Slide 58

Slide 58 text

A quick, step-by-step overview: Putting it all together

Slide 59

Slide 59 text

59 peakace.agency Extracting the main content of every old URL tag

s each first & last sentence

s

s Combine everything Content = Title + h1 + h2s + … ▪ Extract: + main content ▪ Combine: ,

,

s and first & last sentence of each paragraph

Slide 60

Slide 60 text

60 peakace.agency Generate embeddings and store in vector database For each website URL: ▪ Transfer previously generated content to vector DB ▪ Generate embeddings (BERT, GloVe, FastText) ▪ Save embeddings in a vector DB incl. metadata (URL, title, etc.) Content Content Content 0.03 … 0.19 -0.21 … 0.03 0.08 … -0.15

Slide 61

Slide 61 text

61 peakace.agency Search the vector database for the best semantic match For every outdated page: ▪ Vectoric semantic search for KNN (k-nearest neighbour) ▪ Set 301 to NN URL ▪ No more weak redirects ▪ Play with certainty/ temperature settings 0.31 … -0.41 {Get { Article ( nearVector: { limit: 1, content: { vector:[embedding], certainty: 0.8 } } ) { url } }} Future 404

Slide 62

Slide 62 text

Down the rabbit hole…

Slide 63

Slide 63 text

There are A LOT of different ways of doing this…

Slide 64

Slide 64 text

64 peakace.agency State-of-the-art sentence embeddings are the gold standard The Levenshtein distance (basic fuzzy matching) provides an alternative, as we’re mainly dealing with small text snippets and minimal deviations between URL versions: Source: https://pa.ag/49RHG3y The more substantial the changes between two versions, the higher the likelihood that you’ll reap significant benefits from leveraging sentence transformers. h/t Will Nye for the data set

Slide 65

Slide 65 text

Calculating similarity scores across multiple elements and selecting the best matches always works best. Rule of thumb

Slide 66

Slide 66 text

Matching on a singular element performs always worse – regardless of the approach you choose. Rule of thumb

Slide 67

Slide 67 text

Pattern match first! BTW: before you try any of this…

Slide 68

Slide 68 text

… you need solid QA afterwards! Whatever you choose…

Slide 69

Slide 69 text

Garbage in = garbage out! Don‘t forget about input quality

Slide 70

Slide 70 text

As with most things, it can boost efficiency, but it isn't a complete replacement for a human.

Slide 71

Slide 71 text

Who loves ScreamingFrog?

Slide 72

Slide 72 text

Analyse page contents and automatically create redirect maps based on two (old vs new) SF crawls. Facebook AI Similarity Search (FAISS)

Slide 73

Slide 73 text

73 peakace.agency Automated redirect matchmaker for site migrations Fantastic script by Daniel Emery utilising two SF crawls (origin + destination.csv with titles, metas, URLs and headings) to perform a fast semantic search (using sentence transformers) and create a redirect map: Sources: https://pa.ag/4bWAgxy & https://pa.ag/3USteUJ FAISS is an outstanding library designed for the fast retrieval of nearest neighbours in high- dimensional spaces. It enables quick semantic nearest neighbour searches even on a large scale.

Slide 74

Slide 74 text

Not 100% perfect, but ~90% accurate/sensible matches are perfectly realistic. Significant time savings

Slide 75

Slide 75 text

You can use the same approach e.g., for much better internal linking as well as reverse content gap analysis. This doesn’t only work for redirects…

Slide 76

Slide 76 text

Be smart with your redirects: put them on the edge

Slide 77

Slide 77 text

Wait, what?

Slide 78

Slide 78 text

78 peakace.agency Cloudflare Workers to execute redirects on CDN/edge level I already spoke about using CF Workers for a variety of technical SEO tasks including redirects at the SMX Advanced in Berlin back in 2021. Looking to dive deeper? Make sure to grab a copy of the deck: Source: https://pa.ag/4bSxauE Pro tip: this rarely requires dev resources; either you can do it yourself, or sys ops (less busy)

Slide 79

Slide 79 text

79 peakace.agency

Slide 80

Slide 80 text

80 peakace.agency Naturally, Cloudflare is all in on AI as well… Build and deploy AI applications to CFs global network: all it takes is a few lines of code with Workers AI to run an AI task using the Workers framework (or any other stack via API): Source: https://pa.ag/3IgVBV6

Slide 81

Slide 81 text

81 peakace.agency Workers AI – an AI inference as a service platform Empowering developers to run well-known AI models with just a few lines of code on serverless GPUs, all on CFs trusted global network: Source: https://pa.ag/3Tgqlfh TL;DR: using the LLM of choice without having to worry about hosting, deployment, scale, …

Slide 82

Slide 82 text

82 peakace.agency But it doesn‘t stop there: meet Vectorize Use Vectorize to power e.g., semantic search, etc. directly with Workers, improve accuracy and context of answers from LLMs, and/or bring-your-own embeddings from other platforms, including OpenAI and Cohere: Sources: https://pa.ag/49Rys7u & https://pa.ag/3wq2AIr

Slide 83

Slide 83 text

You do realise I just solved all your implementation problems!?

Slide 84

Slide 84 text

84 peakace.agency

Slide 85

Slide 85 text

Automating SEO tasks & workflows with Custom GPTs

Slide 86

Slide 86 text

Custom GPTs are a way to create tailored, custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills. What are Custom GPTs (for ChatGPT)?

Slide 87

Slide 87 text

87 peakace.agency A Custom GPT in its simplest form: Using Peak Ace’s Structured Data GPT to debug and fix errors in JSON-LD mark-up

Slide 88

Slide 88 text

Who here has already created their own Custom GPT?

Slide 89

Slide 89 text

89 peakace.agency Unveiling Peak Ace’s GPT Suite Source: https://pa.ag/peakace-gptsuite SEO Writing Assistant For keyword analysis, SEO content checks, readability assessments, competitor analyses, multilingual support, mobile optimisation, and more: https://pa.ag/seo- writing-assistant Outreach Hero For crafting unique email templates, engaging subject lines, clear messages and more: https://pa.ag/ outreach-hero PPC Performance Analyzer For data analysis and adaptability, optimisation suggestions and more, all with perfect confidentiality: https://pa.ag/ppc- performance-analyzer Structured Data GPT For analysing and troubleshooting structured data for SEO, optimisation suggestions, technical implementation support, and more: https://pa.ag/ structured-data

Slide 90

Slide 90 text

OK, I get it… boooooooring!?

Slide 91

Slide 91 text

But what about 3rd party data integrations (e.g., via API)?

Slide 92

Slide 92 text

92 peakace.agency Making GPTs smarter with external data A Custom GPT to connect with the DataForSEO API to allow for real- time access to actual search data:

Slide 93

Slide 93 text

Why use external data?

Slide 94

Slide 94 text

Well, no… the (training) data is insufficient and/or outdated, numbers are either non-existent or completely made up. ChatGPT can do this out of the box, can’t it?

Slide 95

Slide 95 text

Here‘s a quick three-step guide on how to DIY it. So, how can you build this yourself?

Slide 96

Slide 96 text

96 peakace.agency #1 Provide basic info to get started (name, description, …) Login to ChatGPT > choose Explore GPTs > Create (you need ChatGPT Plus) Well defined instructions are key, think prompting.

Slide 97

Slide 97 text

97 peakace.agency #2 Create an ‘Action’ to call a 3rd party API Head to your API provider and grab your credentials. In our case this was the API Dashboard at DataForSEO.com: Get the OpenAPI Schema for DataForSEO: https://pa.ag/3Pa7oZ3 To use with an action, you need to generate a base64- encoded version of your login credentials: btoa(‘APIemail:APIpass’) The annoying part: you need a Schema according to the OpenAPI spec. But no one reads docs anymore – we just leverage ChatGPT to do this:

Slide 98

Slide 98 text

Remember: APIs usually aren‘t free, so make sure you only publish your new Custom GPT for yourself! #3 Test and publish your GPT

Slide 99

Slide 99 text

Just reauthenticate (base64-encoded version of your login). You also need a new schema (again based on OAS spec). Customisation for other APIs is easy (e.g., Sistrix, etc.)

Slide 100

Slide 100 text

100 peakace.agency Did you know? You can link using pre-filled prompts! You can also link directly to pre-filled prompts and execute them – which works for both Custom GPTs and GPT-4 models. Simply add the query string (using “q=xxx“) to the end of your ChatGPT URL. Source: https://pa.ag/crsum 𝗙𝗼𝗿 any C𝘂𝘀𝘁𝗼𝗺 𝗚𝗣𝗧 𝗮𝗱𝗱: ?q=your+prompt+goes+here 𝗙𝗼𝗿 the 𝗚𝗣𝗧-𝟰 𝗯𝗮𝘀𝗲 𝗺𝗼𝗱𝗲𝗹: ?model=gpt-4&q=your+prompt Use directly in your Chrome browser

Slide 101

Slide 101 text

101 peakace.agency When to use a Custom GPT? Long-term context Custom GPTs are a really powerful tool to ensure instructions remain contextualised over long periods of time. Besides seamless 3rd party data integration, my top-3 reasons why building and using Custom GPTs can help a lot: Building workflows Custom GPTs are best suited for composing workflows aimed at people who don’t know how to properly design context with prompt sequences. Sharing instructions For sharing the exact same instructions e.g., cross-team, without having to worry about specifying them (and how) at prompt level.

Slide 102

Slide 102 text

102 peakace.agency BTW: not compatible & very different… GPTs for MS Copilot ChatGPT has almost completely replaced plugins with GPTs. On Copilot, plugins call on external services. However, Copilot GPTs are a conversation with a specific goal: Source: https://pa.ag/3wyCrr0

Slide 103

Slide 103 text

103 peakace.agency Copilot + Excel: taking data analysis to a whole new level! Super pumped for this, as it’ll enable just about anyone to question, analyse, visualise and refine data effortlessly: Source: https://pa.ag/49YhhBe

Slide 104

Slide 104 text

I know, I know… it‘s a lot.

Slide 105

Slide 105 text

105 peakace.agency Looking to learn more about AI this year? Some new (and free) AI courses and resources to help you boost your AI knowledge: Sources: https://pa.ag/48NKLkk / https://pa.ag/4c2sa6U / https://pa.ag/48FldFJ / https://pa.ag/4c2smTG / http://pa.ag/ai

Slide 106

Slide 106 text

Want to chat about AI & grab a t-shirt? Meet Peak Ace in the expo hall, booth #1 (ground floor) = https://pa.ag/smx24