How Search Engines Really Work in 2023

Slide 1

Slide 1 text

1 1 HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

4 Salutations! I’m Mike King (@iPullRank)

Slide 5

Slide 5 text

5 5 Disclaimer: Just because something is in a patent, or a whitepaper does not mean that Google uses it…but it probably does.

Slide 6

Slide 6 text

6 6 Disclaimer: Correlation is not causation… but it probably is.

Slide 7

Slide 7 text

7 So, I Have a Book Coming Soon The Science of SEO: Decoding Search Engine Algorithms. This is the cover that my publisher sent me. I’m not a fan, but we’ll see what happens. Anyway, you can preorder it wherever books are sold. Here’s the Amazon link. https://amzn.to/3T9qkYN

Slide 8

Slide 8 text

8 8 Thesis: Search engines are not magic. You can deeply understand them if you learn more about Information Retrieval and pay close attention to engineering research.

Slide 9

Slide 9 text

9 9 If you pay enough attention, they are telling you everything you want to know about how Google works.

Slide 10

Slide 10 text

10 A Selective History of Information Retrieval and Search Engines

Slide 11

Slide 11 text

11 Meet Mortimer Taube Taube invented the “Uniterm Indexing System” in 1951 because he felt the Dewey Decimal System could not keep up with the pace of information after the war. This is the basis of what is called an Inverted Index, the data structure behind what we think of as an “index” in search engines.

Slide 12

Slide 12 text

12 Meet Hans Peter Luhn This guy invented the concept of Term Frequency in his paper “The Automatic Creation of Literature Abstracts.” He also invented hashmaps, but we’ll talk about that later.

Slide 13

Slide 13 text

13 Meet Sparck Jones Sparck Jones is considered the godmother of IR. She contributed to information retrieval in many ways, but she is best known for inventing Inverse Document Frequency in the 70’s.

Slide 14

Slide 14 text

14 14 The Two Concepts Together Yielded TF*IDF

Slide 15

Slide 15 text

15 Meet Gerard Salton Gerard Salton is the godfather of Information Retrieval. Much of how search engines of all kinds work is based on methods that he and his team invented.

Slide 16

Slide 16 text

16 16 Gerard Salton invented the Vector Space Model In the vector-space model, documents and queries are converted to vector representations and plotted in multi-dimensional space. The query and document vectors are then compared based on cosine similarity and the ones that are closest to the query are the most relevant. The main takeaway here is that relevance is a quantitative value. This is perhaps the most important concept to understand about how search works.

Slide 17

Slide 17 text

17 17 Disclaimer: Correlation is not causation… but it probably is. You are here

Slide 18

Slide 18 text

18 18 He’s also Responsible for the SMART Evaluation Model

Slide 19

Slide 19 text

19 19 Amit Singhal was a graduate student that studied directly under Gerard Salton.

Slide 20

Slide 20 text

20 20 Amit Singhal Rewrote Google Search in 2001 And he was nice enough to tell us exactly how he did it. http://singhal.info/ieee2001.pdf

Slide 21

Slide 21 text

21 21 Around this Time, Google’s Scoring Functions were Mostly just PageRank + BM25(F)

Slide 22

Slide 22 text

22 22 Meet Brian Pinkerton Brian built the first commercially available web-scale search engine based on a crawler called WebCrawler in 1994. He wrote about it extensively for his PhD thesis. http://www.thinkpink.com/bp/Thesis/ Thesis.pdf

Slide 23

Slide 23 text

23 Every Search Engine is Based on WebCrawler Every search engine can trace its roots back to WebCrawler to some degree. In fact, Lycos, AltaVista, and Google all reference it in their early papers and patents. You know why page titles have been so important for so long? Early search engines only indexed page titles.

Slide 24

Slide 24 text

24 24 Google Copied a Lot from AltaVista

Slide 25

Slide 25 text

25 25 …including Penalizing Websites

Slide 26

Slide 26 text

26 26 Many Googlers, of course, came from AltaVista

Slide 27

Slide 27 text

27 27 Meet Jeff Dean This is Jeff Dean. He’s had an engineering hand in many of Google’s most important innovations ever.

Slide 28

Slide 28 text

28 28 Jeff Talks About How He Ended Up at Google from AltaVista https://www.quora.com/What-was-it-like-to-work-on-the-AltaVista-team-in-the- 90s?ch=10&oid=960520&share=21e5a871&srid=uHsr&target_type=question

Slide 29

Slide 29 text

29 29 The Guy is the Chuck Norris of Computer Science

Slide 30

Slide 30 text

30 30 Fun Fact I’m the only SEO that Jeff follows, so by the principles of PageRank, I’m the greatest SEO of all time. �

Slide 31

Slide 31 text

31 Tweets is Watching The SVP that runs Search at Google is also following me, so if anything happens to me…

Slide 32

Slide 32 text

Slide 33

Slide 33 text

33 33 Google Operates as a Shared Environment All the software across eco can be installed on any machine and any process can be run on any machine. For example, a crawler could also on a machine that is managing rendering or processing or anything else.

Slide 34

Slide 34 text

34 34 Fun Fact: Penguin was built on top of Panda Panda was a group-specific modification factor that was computed as a function of: The number of independent links divided by the number of reference queries. Penguin built on top of that quality score and applied it to links.

Slide 35

Slide 35 text

35 Our Understanding of Search Engines Is Out of Date

Slide 36

Slide 36 text

36 36 At a Base Level, This is What all Search Engines Do Fundamentally, this is the basis of how search engines function. Google has developed many layers on top of this, but this is the core of what they all do.

Slide 37

Slide 37 text

37 37 Google’s High-Level Pipeline Abstraction

Slide 38

Slide 38 text

38 38 We know this, but there is a single set of innovations that sped Google past the SEO community.

Slide 39

Slide 39 text

39 39 Lexical Search vs Semantic Search are the Two Primary Search Models What we as the SEO community do not have a strong enough handle on is that most of what Google’s doing is on the semantic side and that has all improved dramatically over the last 10 years based on machine learning.

Slide 40

Slide 40 text

40 40 Vector Space Model Again Let’s go back to the vector space model again. This model is a lot stronger in the neural network environment because Google can capture more meaning in the vector representations.

Slide 41

Slide 41 text

41 41 Words are Converted to Multi-dimensional Coordinates in Vector Space

Slide 42

Slide 42 text

42 42 This Allows for Mathematical Operations Comparisons of content and keywords become linear algebraic operations.

Slide 43

Slide 43 text

43 43 Relevance is a Function of Cosine Similarity When we talk about relevance, it’s the question of similar is determined by how similar the vectors are between documents and queries. This is a quantitative measure, not the qualitative idea of how we typically think of relevance.

Slide 44

Slide 44 text

44 44 TF-IDF Vectors The vectors in the vector space model were built from TF-IDF. These were simplistic based on the Bag-of-Words model and they did not do much to encapsulate meaning.

Slide 45

Slide 45 text

45 Word2Vec Gave Us Embeddings Word2Vec was an innovation led by Tomas Milosevic and Jeff Dean that yielded an improvement in natural language understanding by using neural networks to compute word vectors. These were better at capturing meaning. Many follow-on innovations like Sentence2Vec and Doc2Vec would follow.

Slide 46

Slide 46 text

46 46 We Went from Sparse Embeddings to Dense Embeddings

Slide 47

Slide 47 text

47 47

Slide 48

Slide 48 text

48 48 Word2Vec Captured Relationship, but Not Context – BERT Captures Context

Slide 49

Slide 49 text

49 49 BERT Yields Embeddings with Higher Dimensionality and Information Capture

Slide 50

Slide 50 text

50 50 Source: https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly- fast-googles-vector-search-technology

Slide 51

Slide 51 text

51 Dense Retrieval You remember “passage ranking?” This is built on the concept of dense retrieval wherein there are more embeddings representing more of the query and the document to uncover deeper meaning.

Slide 52

Slide 52 text

52 52 Dense Retrieval is Scoring down to the Sentence Level

Slide 53

Slide 53 text

53 53 Introducing Google’s Version of Dense Retrieval Google introduces the idea of “aspect embeddings” which is series of embeddings that represent the full elements of both the query and the document and give stronger access to deeper information.

Slide 54

Slide 54 text

54 54 Dense Representations for Entities Google has improved its entity resolution using embeddings giving them stronger access to information in documents.

Slide 55

Slide 55 text

55 55 Embeddings keep getting better at capturing meaning while SEO tools still operate on the Lexical Search model

Slide 56

Slide 56 text

56 56

Slide 57

Slide 57 text

57 57 I Feel Like My Page is More Relevant to [Enterprise SEO]

Slide 58

Slide 58 text

58 58 Relevance isn’t Qualitative to Google.

Slide 59

Slide 59 text

59 59

Slide 60

Slide 60 text

60 60 See! My page is more relevant, but it’s not ranking as well.

Slide 61

Slide 61 text

61 61 https://ipullrank.com/tools/orbitwise

Slide 62

Slide 62 text

62 How Crawling Works

Slide 63

Slide 63 text

63 63 Crawling in the High-Level Pipeline Abstraction You are here

Slide 64

Slide 64 text

64 64 How Google Crawls the Web Most of the magic happens in the URL manager. The crawler simply accesses a page and extracts it. The processing pipeline handles most of the actual parsing. Source: Distributed Crawling of Hyperlinked Documents https://patents.google.com/patent/U S8812478B1/en

Slide 65

Slide 65 text

65 65 Google Doesn’t Crawl Link to Link; The Crawl Based on a URL Queue

Slide 66

Slide 66 text

66 66 Crawling is Stateless Googlebot does not hold a “state.” Although it has the capabilities to, it does not maintain cookies, fill out forms, or make POST requests. Every page it looks at is as though it turned logged on to the web for the first time and

Slide 67

Slide 67 text

67 67 Google Crawls with a Very Tall Viewport As we know Googlebot is crawling mobile-first primarily, but they have limits of what they will see based on infinite scroll.

Slide 68

Slide 68 text

68 68 Crawl Models Typical IR models are breadth-first (whole level is reviewed) or depth-first (last node every path before moving on). Google uses a “best-first” model following PageRank Depth-first Breadth-first

Slide 69

Slide 69 text

69 69 Where Does the “Search Engines Only Crawl 5 Levels Deep” Come From? A paper by IR legend and Yahoo researcher Ricardo Baeza-Yates entitled “Crawling the Infinite Web” identified that crawling only five levels deep is enough to get the most valuable content on the web. https://chato.cl/papers/baeza04_cra wling_infinite_web.pdf

Slide 70

Slide 70 text

70 70 Crawl Frequency Estimation Google would love to use your dates from Schema and your lastmod from your sitemap, but they can’t trust them. So, keep every version of your content that they crawl and they make determinations on how frequently pages change to decide how often to crawl the page.

Slide 71

Slide 71 text

71 71 They May Stop Crawling Based on URL Patterns If Google believes that the URL pattern is going to yield less value if they crawl the page, they will stop crawling all URLs that fit that pattern.

Slide 72

Slide 72 text

72 72 How XML Sitemaps Come Into Play Google downloads XML sitemaps regularly from a separate crawler to update their “per site” database. That database informs the list of URLs that go to the scheduler and it treats “differential sitemaps” with higher priority. There’s also a secondary crawler system for URLs in XML Sitemaps.

Slide 73

Slide 73 text

73 73 That’s why this works so well.

Slide 74

Slide 74 text

74 74 The Generative AI Hack to Increase Crawl for a Page A good way to improve crawl is by updating your pages regularly. An automated way to make it change is by putting a NLG summary at the top of the page and updating it frequently.

Slide 75

Slide 75 text

75 75 Gary says there’s no crawl budget only crawl rate limit and crawl demand.

Slide 76

Slide 76 text

76 76 Crawl Rate Limiting How often Google crawls is a function of how much load a host can handle. Increase the capacity and they will crawl more.

Slide 77

Slide 77 text

77 Crawl Demand This is an area where social signals used to play a heavy factor, but crawl demand is mostly a function of PageRank.

Slide 78

Slide 78 text

78 78 On Crawl Budget = Server Response Time x Time / Error Rate = TTFB x Duration / %Server Error = (Avg. TTFB x Duration / %Server Error) * (CTR x Average Time between page updates) = (Avg # of Crawled URLs x Frequency) / Time @JoriFord

Slide 79

Slide 79 text

79 79 How Google Handles Pages that Don’t Change Pages that have either explicitly or implicitly indicated that they don’t change (304 response code) are basically put on a timeout for a while and Google will reuse what it has in the index. That cache expiry refreshes on a set interval.

Slide 80

Slide 80 text

80 80 What About IndexNow? I don’t see Google joining this initiative because of the cross-search engine URL submission requirement. I could imagine them coming up with their own version of the spec though.

Slide 81

Slide 81 text

81 81 The Best Things to Do to Get More Crawl Activity Load Balance – Route Googlebot to its own autoscaling instances by IP Submit Differential Sitemaps Update your pages regularly Align lastmod with structured data date and on-page date Make sure your robots.txt never returns a 500 Track your crawl budget metrics

Slide 82

Slide 82 text

82 How Indexing Works

Slide 83

Slide 83 text

83 83 Indexing in the High-Level Pipeline Abstraction You are here

Slide 84

Slide 84 text

84 84 Back in the day Google Only Indexed the first 100kb of the page. Now they do 15MB

Slide 85

Slide 85 text

85 85 Estimates Suggest Google Indexes 0.03% of the Web; 60% of the Web is Duplicate

Slide 86

Slide 86 text

86 86 Documents are Parsed and Stored in an Inverted Index An inverted index is like an index in a book where each word is mapped to documents that it appears in.

Slide 87

Slide 87 text

87 87 Phrase-Based Indexing was a key Google Innovation Before Anna Paterson led the phrase- based indexing initiative, search engines built inverted indexes on single phrases and then built posting lists at the intersections of phrases in queries. Phrase-based indexing upended this and introduced phrase co-occurrence and predictive modeling based on those phrases.

Slide 88

Slide 88 text

88 88 Google Saves Versions of Your Pages Forever in the Document Server There are a variety of operations that Google does based on your content over time. So they have cached versions from the first time a page appeared.

Slide 89

Slide 89 text

89 89 This is Replicated Many Many Times Over

Slide 90

Slide 90 text

90 90 Crawl Tiering-based on Update The index is stored in multiple tiers across many machines and split into three dimensions based on how important the page is. Super important and regularly accessed pages are stored in memory. Pages of medium importance stored on solid state drives for fast reads. Pages that are not so important are stored in standard HDDs since they are cheap and don’t need to be fast. Distributed Crawling of Hyperlinked Documents https://patents.google.com/patent/U S8812478B1/en

Slide 91

Slide 91 text

91 91 Deduplication and Canonicalization Deduplication and canonicalization are handled through a series of fingerprints and comparison. There many signals that inform this process such as links, redirects, alternates, etc. Google uses a machine learning classifier to make the final canonical determination.

Slide 92

Slide 92 text

92 92 The Best Things to Improve Indexing Limit Duplication with More Unique content per page Limit your cannibalization through your anchor text Update your pages regularly

Slide 93

Slide 93 text

93 How Rendering Works

Slide 94

Slide 94 text

94 94 Indexing in the High-Level Pipeline Abstraction You are here

Slide 95

Slide 95 text

95 95 Why We Need Rendering Historically Google could not see what happens here.

Slide 96

Slide 96 text

96 96 The Web Rendering System Closes the Gap The Web Rendering System uses a modified version of headless Chromium to render pages. It has different behaviors than a users browser like how it handles random, dates, and service workers. It doesn’t paint pixels because there’s no reason to, but it will stop executing if a process takes up too much CPU.

Slide 97

Slide 97 text

97 97 Rendering is Separate Because It’s Computationally Expensive The WRS is not going to render every page unless it believes its worthwhile.

Slide 98

Slide 98 text

98 98 Websites Have Many Options for Rendering These Days Google handles SSR the best, obviously, but they can access your content with any of these models.

Slide 99

Slide 99 text

99 99 The Best Things to Improve Rendering SSR, if you can Make your pages worth rendering if you can’t Monitor your crawl volume vs CPU usage

Slide 100

Slide 100 text

10 0 How Processing Works

Slide 101

Slide 101 text

10 1 10 1 Processing in the High-Level Pipeline Abstraction You are here

Slide 102

Slide 102 text

10 2 10 2 Processing is Where the Magic Happens

Slide 103

Slide 103 text

10 3 10 3 Standard NLP Pipeline

Slide 104

Slide 104 text

10 4 10 4 There are Query-Dependent and Query-independent Scores

Slide 105

Slide 105 text

10 5 105 Embeddings have disrupted our understanding of every part of the processing pipeline.

Slide 106

Slide 106 text

10 6 Websites as Vectors Just as there are representations of pages as embeddings, there are vectors representing websites and authors.

Slide 107

Slide 107 text

10 7 Author Vectors Similarly, Google has Author Vectors wherein they are able to identify an author and the subject matter that they discuss. This allows them to fingerprint an author and their expertise.

Slide 108

Slide 108 text

10 8 Build Your Links Contextually If you’re still building links, it’s very likely that they have ramped up the capabilities around relevance between pages for links. They are likely discounting pages that are not close relevance matches anymore.

Slide 109

Slide 109 text

10 9 How Searching Works

Slide 110

Slide 110 text

11 0 11 0 Google Is Now a Series of Over 200 Microservices Running in Parallel

Slide 111

Slide 111 text

11 1 Misspellings are Fixed Of course.

Slide 112

Slide 112 text

11 2 11 2 Queries are Expanded and Substituted Based on Entities and Synonymy

Slide 113

Slide 113 text

11 3 11 3

Slide 114

Slide 114 text

11 4 11 4 Expansions Are Scored The different versions of the query are scored and they may be ran in parallel and then the results are scored and then they return the best set.

Slide 115

Slide 115 text

11 5 11 5 There is More Happening Behind the Scenes With Entities

Slide 116

Slide 116 text

11 6 11 6 Here’s an example [The Rock]

Slide 117

Slide 117 text

11 7 11 7 Here’s an example [The Rock] It’s also relevant to a movie called “The Rock”

Slide 118

Slide 118 text

11 8 11 8 [The Rock imdb] Google’s not sure what you mean here, so it’s showing both.

Slide 119

Slide 119 text

11 9 11 9 [The Rock] is expanded to [The Rock actor] in the background

Slide 120

Slide 120 text

12 0 12 0 There is More Happening Behind the Scenes With Entities

Slide 121

Slide 121 text

12 1 12 1 That’s How Results Like This Happen

Slide 122

Slide 122 text

12 2 12 2 Neural Matching to Determine the Meaning of the Query Again, with the embeddings!

Slide 123

Slide 123 text

12 3 How Ranking Works

Slide 124

Slide 124 text

12 4 12 4 Document Scoring Simplified Content Factor Content Factor Speed Factor Link Factor Link Factor Document Score + + + + = HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 125

Slide 125 text

12 5 12 5 Each Component to the Equation Has a Weight Content Factor Content Factor Speed Factor Link Factor Link Factor Document Score a b c d e + + + + = HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 126

Slide 126 text

12 6 12 6 The Weights May Look Like This Content Factor Content Factor Speed Factor Link Factor Link Factor Document Score 3 6 1 2 2 + + + + = HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 127

Slide 127 text

12 7 12 7 This is What Marketers Do 5 2 4 95 74 Content Factor Content Factor Speed Factor Link Factor Link Factor 369 3 6 1 2 2 + + + + = HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 128

Slide 128 text

12 8 12 8 So, Then, Google Turns Down the Weights on Links 5 2 4 95 74 Content Factor Content Factor Speed Factor Link Factor Link Factor 55.49 3 6 1 .25 .01 + + + + = HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 129

Slide 129 text

12 9 12 9 Google Understands Queries by Breaking them Into Entities Leveraging Entities Allows Queries to be Expanded and Related Entities and Attributes to be Discovered

Slide 130

Slide 130 text

13 0 Google’s Scoring Functions There’s more than one scoring function. Google scores content and links a variety of different ways and then chooses the best results. There is not just one “algorithm.” This is why different queries seem to value signals differently. HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 131

Slide 131 text

13 1 Post-retrieval Adjustments In addition to their being multiple scoring functions with different results to choose from, Google may make further re-ranking adjustments based on any number of features and factors. So, really, anything could happen in the SERPs.

Slide 132

Slide 132 text

13 2 13 2 When Amit Singhal ran Google Search he was famously against using machine learning in rankings.

Slide 133

Slide 133 text

13 3 13 3

Slide 134

Slide 134 text

13 4 13 4 Machine Learning for Search Rankings Has Been Around For a Long Time

Slide 135

Slide 135 text

13 5 13 5

Slide 136

Slide 136 text

13 6 13 6 John Giannandrea from Google Brain took over and certainly did not have that bias.

Slide 137

Slide 137 text

13 7 13 7 Learning to Rank Learning to Rank is using supervised machine learning for information retrieval systems.

Slide 138

Slide 138 text

13 8 13 8 Learning to Rank Requires One of Two Things Human reviewed quality scores Implicit User feedback

Slide 139

Slide 139 text

13 9 13 9 Google Has the Quality Rater Program The Quality Ratings are not just for evaluation. They act as the feature engineered data that trains the learning to rank models.

Slide 140

Slide 140 text

14 0 140 Google Obviously Has Query and Click Logs

Slide 141

Slide 141 text

14 1 14 1 Enter TensorFlow Rankings

Slide 142

Slide 142 text

14 2 14 2 It’s Open Source!

Slide 143

Slide 143 text

14 3 14 3 We Keep Hearing How they Don’t Use Clicks

Slide 144

Slide 144 text

14 4 14 4 At Best, This is a Lie by Omission

Slide 145

Slide 145 text

14 5 14 5 Fascinating Blog post from the Google Cloud team…

Slide 146

Slide 146 text

14 6 14 6 Here’s Google Telling On Itself

Slide 147

Slide 147 text

14 7 147 The inputs that we control have not changed, but our understanding of what Google is doing with them needs to.

Slide 148

Slide 148 text

14 8 What about the new Search Generative Experience (SGE)? HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 149

Slide 149 text

14 9 At I/O Google Announced a Dramatic Change to Search The experimental “Search Generative Experience” brings generative AI to the SERPs and significantly changes Google’s UX.

Slide 150

Slide 150 text

15 0 15 0 Queries are Longer and the Featured Snippet is Bigger 1. The query is more natural language and no longer Orwellian Newspeak. It can be much longer than the 32 words that is has been historically in order 2. The Featured Snippet has become the “AI snapshot” which takes 3 results and builds a summary. 3. Users can also ask follow up questions in conversational mode. 3 2 1

Slide 151

Slide 151 text

15 1 15 1 This is Called “Retrieval Augmented Generation” Neeva, Bing, and now Google’s Search Generative Experience all use pull documents based on search queries and feed them to a language model to generate a response.

Slide 152

Slide 152 text

15 2 15 2 Google’s Version of this is called Retrieval-Augmented Language Model Pre-Training (REALM )

Slide 153

Slide 153 text

15 3 15 3 SGE is built from REALM + PaLM 2 and MUM MUM is the Multitask Unified Model that Google announced in 2021 as way to do retrieval augmented generation. PaLM 2 is their latest state of the art large language model.

Slide 154

Slide 154 text

15 4 15 4 If You Want More Technical Detail Check Out This Paper https://arxiv.org/pdf/2002.08909.pdf

Slide 155

Slide 155 text

15 5 It’s Experimental because it’s Error-prone Bing and ChatGPT lit a competitive fire under Google, but they have been working on these technologies for years. They were slow to release because of the various reasons that LLMs are likely to return disinformation.

Slide 156

Slide 156 text

15 6 15 6 The Experience May Also Pollute Search Quality The experience of a response from Google suggests that there is a person giving the response. The generative text may also conflict with other aspects returned in search.

Slide 157

Slide 157 text

15 7 157 Sounds cool, but how is it going to affect what we do?

Slide 158

Slide 158 text

15 8 The Search Demand Curve will Shift With the change in the level of natural language query that Google can support, we’re going to see a lot less head terms and a lot more long tail term. Going down Going up

Slide 159

Slide 159 text

15 9 15 9 The CTR Model Will Change With the search results being pushed down by the AI snapshot experience, what is considered #1 will change. We should also expect that any organic result will be clicked less and the standard organic will drop dramatically. However, this will likely yield query displacement.

Slide 160

Slide 160 text

16 0 Rank Tracking Will Be More Complex As an industry, we’ll need to decide what is considered the #1 result. Based on this screenshot positions 1- 3 are now the citations for the AI snapshot and #4 is below it. However, the AI snapshot loads on the client side, so rank tracking tools will need to change their approach.

Slide 161

Slide 161 text

16 1 161 None of this changes what we do tactically, but it may change what we do strategically.

Slide 162

Slide 162 text

16 2 The future of Content and Links Or how is generative AI going to change all of this for us? HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 163

Slide 163 text

16 3 163 Disclaimer: Correlation is not causation… but it probably is. You are here

Slide 164

Slide 164 text

16 4 164 We’ve Evolved Beyond Word Counts.

Slide 165

Slide 165 text

16 5 16 5 One of Singhal’s Early Innovations was Doc Length Normalization Google has always had the idea of making sure content length isn’t an overpowering factor. Amit Singhal recognized longer documents inherently outperform shorter ones in retrieval tasks, so it’s always been a fundamental thing that Google looked at.

Slide 166

Slide 166 text

16 6 Marketers Are Just Copying… People are skipping the step in the Skyscraper technique wherein they’re are supposed to create “better” content.

Slide 167

Slide 167 text

16 7 16 7 This is What a Lot of Us Are Doing Now HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 168

Slide 168 text

16 8 16 8 Here’s Google Telling On Itself

Slide 169

Slide 169 text

16 9 16 9 Here’s Google Telling On Itself https://ipullrank.com/ai-seo-guide

Slide 170

Slide 170 text

17 0 170 We need to evolve beyond what is basically complex “keyword density.”

Slide 171

Slide 171 text

17 1 17 1 Soon, Everyone will be able to Generate Perfectly Optimized Content

Slide 172

Slide 172 text

17 2 In Fact, Kristin @ Fractl Built It Kristin built a tool that allows someone to put in a keyword or a topic and it will generate robust content based on what is currently ranking. https://www.frac.tl/interactives/long- form-article-generator/

Slide 173

Slide 173 text

17 3

Slide 174

Slide 174 text

17 4 17 4 This is Where Information Gain Comes Into Play Conceptually, as it relates to search engines, Information Gain is the measure of how much unique information a given document adds to the ranking set of documents. In other words, what are you talking about that your competitors are not?

Slide 175

Slide 175 text

17 5 Google’s Information Gain Patent Google’s patent indicates that they are specifically scoring for documents that feature net new information over other documents on the same topic.

Slide 176

Slide 176 text

17 6 17 6 Information Gain is Best Driven by Looking at Across the Entity Graph •Thus far, there is a very limited set of tools in the SEO space that are specifically looking at entities and their relationships. A non-SEO tool called EntiTree visualizes related entities from Wikidata. https://www.entitree.com/ Using this will give you insights into what entities are being considered for your target entity.

Slide 177

Slide 177 text

17 7 177 Right now, most tools are just showing you how to be a copycat.

Slide 178

Slide 178 text

17 8 17 8 It’s Not Exactly Clear What SEO Tools Are Looking At These seem to be topics, but are they entities?

Slide 179

Slide 179 text

17 9 17 9 Some Tools Are Looking Vertically at a SERP for Term Co-occurence •While it’s possible that it may yield the same or similar results, tools like this are not looking across relationships of entities.

Slide 180

Slide 180 text

18 0 18 0 Other Tools are Mapping Topical Clusters •While this approach captures more breadth as it relates to the topic, it is not the same as reviewing entities.

Slide 181

Slide 181 text

18 1 18 1 Review the Details of the Entity in Wikidata

Slide 182

Slide 182 text

18 2 18 2 Review the Features of the Entity and Talk About it In Your Content •Ultimately, the process is the same. Work the discussion entities, their attributes and related entities into your content in all the relevant places in your content.

Slide 183

Slide 183 text

18 3 183 If it’s not an entity that Google recognizes, it’s not worth optimizing for.

Slide 184

Slide 184 text

18 4 18 4 Get Entity SEO Tools Into the Workflow HOW SEARCH ENGINES REALLY WORK IN 2023

Slide 185

Slide 185 text

18 5 18 5 Quick Tool: Reviewing Entity Salience HOW SEARCH ENGINES REALLY WORK IN 2023 I whipped up a quick tool in Colab where you can see how entities are appearing in your own content. You can put text, upload a file, or select URL. Compare the usage of entities in your content with your competitors. https://colab.research.google.com/drive/18QXrdAPoKhUl76gGzuxk_vDiUqMeRyqx?usp=sharing

Slide 186

Slide 186 text

18 6 18 6 Context-Limited Generative AI is a Huge Opportunity for SEO

Slide 187

Slide 187 text

18 7 18 7 Build a bot based on your own content

Slide 188

Slide 188 text

18 8 18 8

Slide 189

Slide 189 text

18 9 18 9 If you’re using ChatGPT, you need AIPRM for prompt management. https://www.aiprm.com

Slide 190

Slide 190 text

19 0 19 0

Slide 191

Slide 191 text

19 1 191 If your prompt is just one sentence, don’t be surprised when you get garbage back.

Slide 192

Slide 192 text

19 2 19 2 Remember You Need to Build Around a Content Strategy Read more about this approach at https://ipullrank.com/generative-ai-content-strategy

Slide 193

Slide 193 text

19 3 193 Aight, that’s enough.

Slide 194

Slide 194 text

19 4 Signing Off (for the Last Time) Roll the Credits

Slide 195

Slide 195 text

19 5 195 We are firmly in a semantic search environment. We need to stop operating from the lexical model.

Slide 196

Slide 196 text

19 6 19 6 The Things You Should Do Don’t Use qualitative measures in the places where Google is using quantitative measures Use tools that calculate embeddings Improve the management of your XML sitemaps Leverage generative AI to scale content optimization Build links contextually Start, actually using entities

Slide 197

Slide 197 text

19 7

Slide 198

Slide 198 text

19 8 19 8 Check Out FridAI It’s on AIPRM.com.

Slide 199

Slide 199 text

19 9 https://ipullrank.com/resources/seo-weekly

Slide 200

Slide 200 text

20 0

Slide 201

Slide 201 text

Mike King Founder / CEO @iPullRank Thank You | Q&A [email protected] Award Winning, #GirlDad Featured by Download the AI Guide: https://ipullrank.com/ai-seo-guide Use Orbitwise: https://ipullrank.com/tools/orbitwise