SMX Munich - Ditch Keyword-Specific Pages – Build Topic Focus through Content Consolidation

Slide 1

Slide 1 text

Ditch Keyword-Specific Pages – Build Topic Focus through Content Consolidation Amanda King @ FLOQ SMX Munich 19 March 2025

Slide 2

Slide 2 text

What’s what 1. Keywords vs. topics for Google 2. The real world 3. The analysis 4. How to implement this yourself 5. Who’s this human speaking to us?

Slide 3

Slide 3 text

From https://searchengineland.com/how-google-search-ranking-works-445141

Slide 4

Slide 4 text

Have keywords ever actually been a thing Google used?

Slide 5

Slide 5 text

Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean” matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)

Slide 6

Slide 6 text

Queries very quickly become entities “[...]identifying queries in query data; determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a sufﬁx; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/ , https://patents.google.com/patent/US8032507B1/en

Slide 7

Slide 7 text

● Semantic distance ● Keyword-seed affinity ● Category-seed affinity ● Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/

Slide 8

Slide 8 text

It actually looks like they had a classification engine for entities as well This patent was ﬁled in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en

Slide 9

Slide 9 text

“Rather than simply searching for content that matches individual words , BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/ 2022

Slide 10

Slide 10 text

What is, then, is “information gain”? Phrase-based searching in an information retrieval system, granted 2009 (link) ; “Contextual estimation of link information gain” granted to Google in Jul 2024 (link) [Australian Shepard] URL 1 URL 2 Aussie Aussie Red merle Blue merle Tricolor

Slide 11

Slide 11 text

But then there’s this whole concept of consensus score Mark Williams-Cook tested this with the Google exploit he analysed and got a bounty for. Source: https://www.youtube.com/watch?v=_AQ9UDqES80

Slide 12

Slide 12 text

So for business owners online there will always be a tension between creating unique content for customers and answering the same FAQ.

Slide 13

Slide 13 text

One way to address this is to be as direct as possible, cut out historical ‘SEO content’ and other content bloat to focus on matching user intent and topical relationships.

Slide 14

Slide 14 text

Not keywords.

Slide 15

Slide 15 text

The Real World Crowdsourced case studies from colleagues

Slide 16

Slide 16 text

Unfocused law firm ● Over 700 thin blog posts ● Poor search intent targeting: Rankings for queries unrelated to core business offerings ● Previous agency focused on audits without taking action From ﬁrewire digital: https://www.ﬁrewiredigital.com.au/case-studies/burke-mead-lawyers/

Slide 17

Slide 17 text

Challenger bank in Europe wins via consolidation and authority ● 817 pages (38%) removed to to irrelevance ● 376 pages (17%) merged due to overlap ● 202 pages (9%) to improve Direct communication with Precis Digital, Linus Alenius

Slide 18

Slide 18 text

Informational Financial Sites ● Site #1 pruned 15% of old content Oct 2023 ● Site #2 migrated (with content consolidation) May 2023, with ongoing consolidation in 2024 Site #1 Site #2

Slide 19

Slide 19 text

The Analysis I wanted to see how this would all scale A big thank you to the SEMRush team, particularly the engineer pulling all this data for me, Yulia!

Slide 20

Slide 20 text

Across 8,421 domains I reviewed data to see if reducing pages was a stable, sustainable choice for growth 2022 August First helpful content update December Helpful content update Link spam update 2023 March Core update August Core update September Helpful content update October Core update November Core update Reviews update 2024 March Core update Spam update June Spam update November Core update December Spam update Core update August Core update Feb 2023 My data starts here

Slide 21

Slide 21 text

I aimed to play in the “middle of the road” websites, not super massive ones Classiﬁcation Average Start Pages Average End Pages Average Absolute Page Change Average Relative Page Change Average Trafﬁc Change MPMT 16,130 26,910 10,780 81.40% 74.11% MPST 11,690 16,504 4,814 67.65% -6.15% FPLT 29,487 19,659 -9,828 -34.16% -55.92% SPMT 18,860 20,215 1,355 12.26% 50.78% SPLT 10,799 10,902 103 11.11% -51.01% MPLT 10,090 14,397 4,307 76.80% -46.09% SPST 15,222 15,532 310 10.45% -8.91% FPST 32,028 20,756 -11,272 -30.72% -10.85% FPMT 30,353 22,168 -8,185 -30.96% 54.71%

Slide 22

Slide 22 text

This was an interesting way to start the analysis Fewer websites to work with isn’t necessarily a bad thing though Classification Count Percent Shutdown 1,742 20.69% More pages more traffic 1,445 17.16% More pages same traffic 922 10.95% Fewer pages less traffic 747 8.87% Same pages more traffic 724 8.60% Same pages less traffic 667 7.92% More pages less traffic 662 7.86% Same pages same traffic 653 7.75% Fewer pages same traffic 468 5.56% Fewer pages more traffic 391 4.64%

Slide 23

Slide 23 text

Across the board, reducing your pages gave you slightly lower volatility in traffic changes than if you added more Classification Domains Average Traffic Increase Average Page Change Average Traffic Volatility Traffic to Page Ratio MPMT 1,445 74.11% 81.40% 25.29 5.86 FPMT 391 54.71% -30.96% 22.86 5.93 SPMT 724 50.78% 12.26% 21.84 15.67

Slide 24

Slide 24 text

When we remove media, this becomes even clearer: FPMT volatility is in the 22-26 range, compared to MPMT (30+)

Slide 25

Slide 25 text

Yet reviewing media and non-media sites, media may actually benefit more from consolidation ● On average, media domains received 42% more incremental trafﬁc than non-media domains if they reduced their website pages ● 69% of domains that added pages were considered stable, whereas if media sites reduced pages, 73% were seen as successful.

Slide 26

Slide 26 text

Reviewing specific industries show publications and YMYL tried this and succeeded If you’re in ● B2B ● Medical ● Style/fashion ● Auto You may particularly beneﬁt from reducing your pages - these industries, on average, performed better when they reduced pages than when they added more.

Slide 27

Slide 27 text

B2B Median Page Reduction: -15.99% Median Trafﬁc Increase: 103.36% Pixels.com Page reduction: -12.89% (961,615 → 837,710 pages) Trafﬁc increase: +107.76% (339,573 → 705,502)

Slide 28

Slide 28 text

Medical Median Page Reduction: -8.99% Median Trafﬁc Increase: 57.11% Stability is remarkable - most of these sites have very low volatility (3-9%), indicating consistent growth rather than erratic trafﬁc spikes.

Slide 29

Slide 29 text

Fashion Median Page Reduction: -31.95% Median Trafﬁc Increase: 93.34% Flaunt.com page reduction: -16.66% (10,945 → 9,122 pages) Trafﬁc increase: +444.32% (48,033 → 261,451)

Slide 30

Slide 30 text

Auto Median Page Reduction: -21.68% Median Traffic Increase: 54.09% whatcar.com: Gradual decline from ~13K to ~8.5K pages by end of 2024. Coincides with significant traffic growth.

Slide 31

Slide 31 text

Less is more—if you’re not sure, be super targeted in what you consolidate Sweet spot: -10% to -20% reduction in content shows the highest trafﬁc gains at 70.74%. Minimal page reductions (0-10%) produced substantial trafﬁc gains (54.8%) with the highest stability rating (83.78%).

Slide 32

Slide 32 text

Large sites (10K-100K pages) achieve dramatically higher traffic gains (77.67%) compared to smaller sites, despite reducing a smaller percentage of their content (-24.24%).

Slide 33

Slide 33 text

Small sites typically require 34-51% page reduction; large sites achieve better results with only about 20% reduction Site Size Domains Average Trafﬁc Gain Average Page Reduction Average Volatility Very Small (<100 pages) 10 47.01% -51.43% 25.36 Small (100-1K pages) 82 36.79% -35.85% 19.37 Medium (1K-10K pages) 158 46.55% -34% 20.39 Large (10K-100K pages) 121 77.67% -23.65% 27.33 Very Large (>100K pages) 20 57.59% -20.95% 28.33

Slide 34

Slide 34 text

Based on the patterns I’m seeing, gradual, specific page reductions (likely content consolidation) are the more successful method to approach page reduction

Slide 35

Slide 35 text

Where YMYL starts, the rest of the Internet will likely follow: plan to consolidate your content by 10-20% in the next 18 months.

Slide 36

Slide 36 text

So how do we do this ourselves? Steps to implement to be on the more favourable end of reducing the pages on your website

Slide 37

Slide 37 text

We know folks have failed doing this, so 1. Find and resolve duplicate content 2. Find and resolve irrelevant content 3. Map and match user intent 4. Consolidate any and all with proper redirects, 404’s or 410s 5. BONUS: E-E-A-T updates, particularly if in YMYL

Slide 38

Slide 38 text

Find your duplicate content Do this at scale by using a combination of tools: ● Screaming Frog to crawl URLs ● BigQuery, Python and FAISS (Facebook AI Similarity Search - big list of similar embeddings)

Slide 39

Slide 39 text

Or if you don’t have the time to do this programmatically yourself, use tools Common tools include: ● The duplicate content ﬂag in Screaming Frog (this assumes you’re able to crawl your entire website in one go) ● “Duplicate, Google chose different canonical than user” report in Google Search Console ● Site audits in SEMRush or Ahrefs or Siteliner (free up to 250 pages)

Slide 40

Slide 40 text

Find your irrelevant content Analyse Search Console data at scale in BigQuery Deﬁne your terms for: ● Brand ● Product ● Topics SELECT page, query, SUM(impressions) AS total_impressions, SUM(clicks) AS total_clicks FROM `your_project.your_dataset.gsc_data` WHERE NOT REGEXP_CONTAINS(LOWER(query), r'(yourbrand|brand|band)') AND NOT REGEXP_CONTAINS(LOWER(query), r'(product1|product2|topic1)') GROUP BY page, query HAVING total_impressions > 50 -- adjust thresholds as needed ORDER BY total_impressions DESC;

Slide 41

Slide 41 text

But what if I’m not sure what my topics are? Use the topics report in SEMRush (or similar) for a direction

Slide 42

Slide 42 text

If you have more brainspace than I do, you could do this dynamically by automated relevance scoring with your brand proposition copy analysed against your query dataset using classifyText from Google’s Natural Language API

Slide 43

Slide 43 text

How Google classifies intent within the Search Quality Evaluator Guidelines ● Know query, some of which are Know Simple queries ● Do query, when the user is trying to accomplish a goal or engage in an activity ● Website query, when the user is looking for a speciﬁc website or webpage ● Visit-in-person query, some of which are looking for a speciﬁc business or organization, some of which are looking for a category of businesses Refresh your memory of the Search Quality Evaluator Guidelines: https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Folk have dug even deeper into how Google actually classifies intent in it’s search engine Use this to classify your primary keyword set …if you want to use it for more than a hundred or so queries maybe pick up an AlsoAsked subscription or ping MW-C 👀 Or join the request for an API… https://rqpredictor.streamlit.app/ ; https://www.linkedin.com/posts/markseo_seo-activity-7298698401955627008-VRQ6

Slide 46

Slide 46 text

And then once we classify the queries, we need to check if the topic aligns to the primary user intent for the query

Slide 47

Slide 47 text

If it doesn’t… …Fix it.

Slide 48

Slide 48 text

Unhelpful Content Low Brand Visibility Monetizes Clicks Poor UX Low-Effort Unpersonal Easy to Replicate Over-optimized SEO Helpful Content Good Brand Visibility Long-term Audience Unobtrusive UX High-Effort Personal Hard to Replicate Straightforward SEO

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Amanda King is human ● Fifteen years in the SEO industry ● Business- and product-focussed ● Traveled to 40+ countries ● Knows CRO, Data, UX ● Always open to learning something new ● Slightly obsessed with tea

Slide 51

Slide 51 text

Thank you LinkedIn: Amanda King, FLOQ https://www.linkedin.com/in/amandaecking/ Site: ﬂoq.co Business course: ﬂoq.co/academy