Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The New Content SEO

The New Content SEO

Google hasn't cared about keywords for a long time, and neither should you. In this conference talk at the Sydney SEO Conference by Prosperity Media in April if 2023, I do a patent dive on how Google analyses the topic of the page and what information gain is, as well as how you can and should be integrating it into your "keyword" research.

Amanda King

April 14, 2023
Tweet

More Decks by Amanda King

Other Decks in Marketing & SEO

Transcript

  1. The New Content SEO What we’ll talk about 1. A

    quick refresher 2. Have keywords ever actually been a thing Google used? 3. How Google reads content may not be what you think 4. So what do we do about all this? 5. Who tf am I?
  2. DEAR READERS: I’m still learning. So are you. If I’ve

    royally butchered a concept, come talk to me after. I like learning.
  3. A brief refresher on how Google crawls the Internet It’s

    three separate stages: crawl, index, serve; with sub-processes for scoring and ranking. Content analysis is included in the indexing engine, content relevancy is in the serving engine. While this is an old patent (2011) the fundamentals still apply for this reminder. Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023 https://developers.google.com/search/docs/fundamentals/how-search-works
  4. • Query Deserves Freshness is a system • Helpful Content

    is a system • MUM & BERT are systems ◦ “Bidirectional Encoder Representations from Transformers (BERT) is an AI system Google uses that allows us to understand how combinations of words express different meanings and intent.” The search engine ranking engine works in systems https://developers.google.com/search/docs/appearance/ranking-systems-guide
  5. Queries very quickly become entities “[...]identifying queries in query data;

    determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a suffix; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
  6. Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean”

    matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  7. So it decided to make it’s search engine concept and

    phrase-based. “The system is adapted to identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are “valid” or “good” phrases [...]The system is further adapted to identify phrases that are related to each other, based on a phrase's ability to predict the presence of other phrases in a document.” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  8. “Rather than simply searching for content that matches individual words,

    BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/
  9. MUM takes this a step further • About 1,000 times

    more powerful than BERT • Trained across 75 languages for greater context • Recognises this across different types of media (video, text, etc) https://blog.google/products/search/introducing-mum/
  10. Step 1 Indexing Indexing is the stage where content is

    analysed, so how does Google do it?
  11. BERT is a technique for pre-training natural language classification. So

    how does natural language processing work, once it has a corpus of data? Source: https://blog.google/products/search/search-language-understanding-bert/
  12. 1. Parsing: Tokenisation, parts of speech, stemming (for Google, lemmatization)

    2. Topic Modelling: entity detection, relation detection 3. Understanding 4. Onto the next engine, ranking So the broad strokes steps in the indexation process are
  13. • Semantic distance • Keyword-seed affinity • Category-seed affinity •

    Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
  14. How natural language processing usually works: tokenization and subwords Source:

    https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
  15. • N-grams: important to find the primary concepts of the

    sentence by identifying and excluding stop words • “Running” “runs” “ran” = same base — “run” This gets broken down even further https://patents.google.com/patent/US8423350B1/
  16. Google does a lot of things when detecting entities and

    relationships • Identifying aspects to define entities based on popularity and diversity, granted in 2011 (link) • Finding the entity associated with a query before returning a result, using input from human quality raters to confirm objective fact associated with an entity, granted in 2015 (link) • Understanding the context of the query, entity and related answer you’re searching for, granted in 2019 (link) • Aims to understand user generated content signals in relation to a webpage, granted in 2022 (link)
  17. Google does a lot of things when detecting entities and

    relationships • Understanding the best way to present an entity in a results page, granted in 2016 (link) • Managing and identifying disambiguation in entities, granted in 2016 (link) • Build entities through co-occurring ”methodology based on phrases” and store lower information gain documents in a secondary index, granted in 2020 (link) • Understanding context from previous query results and behaviour, granted in 2016 (link)
  18. Step 2 Scoring In their own description of their ranking

    & scoring engine, Google offers 5 buckets: • Meaning • Relevance • Quality • Usability • Context
  19. Scoring is all those 200+ factors we talk about… Google

    has cited everything from internal links, external links, pogo sticking, “user behaviour”, proximity of the query terms to each other, context, attributes, and more Just a few of the patents related to scoring: • Evaluating quality based on neighbor features (link) • Entity confidence (link) • Search operation adjustment and re-scoring (link) • Evaluating website properties by partitioning user feedback (link) • Providing result-based query suggestions (link) • Multi-process scoring (link) • Block spam blog posts with “low link-based score” (link)
  20. It actually looks like they have a classification engine for

    entities as well This patent was filed in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en
  21. “...link structure may be unavailable, unreliable, or limited in scope,

    thus, limiting the value of using PageRank in ascertaining the relative quality of some documents.” (circa 2005) https://patents.google.com/patent/US7962462B1/en
  22. How Google ranks content • Based on historical behaviour from

    similar searches in aggregate (application) • Based on external links (link) • Based on your own previous searches (link) • Based on or not it should directly provide the answer via Knowledge Graph (link) • Phrase- and entity-based co-occurrence threshold scores (link) • Understanding intent based on contextual information (link)
  23. Helpful Content Update & Information Gain Score (granted Jun 2022)

    • The information gain score might be personal to you and the results you’ve already seen • Featured snippets may be different from one search to another based on the information gain score of your second search • Pre-training a ML model on a first set of data shown to users in aggregate, getting an information gain score, and using that to generate new results in SERPs. https://patents.google.com/patent/US20200349181A1/en
  24. What is “information gain”? “Information gain, as the ratio of

    actual co-occurrence rate to expected co-occurrence rate, is one such prediction measure. Two phrases are related where the prediction measure exceeds a predetermined threshold. In that case, the second phrase has significant information gain with respect to the first phrase.“ - Phrase-based searching in an information retrieval system, granted 2009 (link)
  25. So, basically, it’s quantifying to what degree you talk about

    all the topics Google sees as related to your main subject.
  26. If information gain is such a strong concept in which

    results Google chooses which content to show, why do so few folks talk about it? https://patents.google.com/patent/US7962462B1/en
  27. Redo keyword research and overlay entities • Pull content for

    at least the top 10 search results ranking for your target keyword • Dump them into Diffbot (https://demo.nl.diffbot.com/) or the Natural Language AI demo (https://cloud.google.com/natural-language) • Note the entities and salience • Run your target page • Understand the differences • Update your content accordingly
  28. Start with keyword research, find co-occuring terms • Pull content

    for at least the top 10 search results ranking for your target keyword • Look at TF-IDF calculators to reverse engineer the topic correlation (Ryte has a paid one) • Note the terms included • Run your target page • Understand the differences • Update your content accordingly
  29. Break old content habits • FAQ on product pages •

    Consolidate super-granularly targeted blog articles • Think outside of the blog folder — the semantic relationship can carry through to the directory order of the website as well • Internal linking can be a secret weapon • Fit content to purpose: not everything needs a 3,000 word in-depth article
  30. Amanda King is a human • Over a decade in

    the SEO industry • Traveled to 40+ countries • Business- and product-focussed • Knows CRO, Data, UX • Always open to learning something new • Slightly obsessed with tea
  31. How Google reads content • • BERT is open source

    // BERT q&a demo • Latent Direchlet Allocation • BERT or PaLM? (PaLM = LLM) or LaMDA? Or CALM • Recent deep learning with BERT • MuM