@badams
@badams
Priority for News Publishers
✓ Rapid crawling of newly
published articles
Crawler
Slide 9
Slide 9 text
@badams
@badams
Optimise Crawling (1)
• Fast server response time
Slide 10
Slide 10 text
@badams
@badams
Load Speed
Fast response time = optimal use of Googlebot
Slide 11
Slide 11 text
@badams
GSC Crawl Stats
Slide 12
Slide 12 text
@badams
@badams
Optimise Crawling (2)
• Serve correct HTTP status codes
➢ 200 OK
➢ 301 / 302 Redirects
➢ 304 Not Modified
➢ 401 / 403 Permission Issues
➢ 404 / 410 Not Found/Gone
➢ 5xx Error
Slide 13
Slide 13 text
@badams
@badams
Optimise Crawling (3)
• ALL resources consume crawl budget
➢ Not just HTML pages
➢ Reduce HTTP requests per page
• Google AdsBot can consume crawl budget
➢ Double-check your Google Ads campaigns
• Link equity (PageRank) impacts crawl budget
➢ More link equity = more crawl budget
Slide 14
Slide 14 text
@badams
@badams
2. Indexer
Crawler Indexer Ranker
Slide 15
Slide 15 text
@badams
@badams
2. Indexer
Indexer
➢ Index selection
➢ HTML tokenisation & parsing
➢ Rendering (+++)
➢ Meta tag processing
➢ Canonicalisation
➢ Index sanitation
➢ Calculating PageRank
➢ Quality evaluations
➢ … ?
Slide 16
Slide 16 text
@badams
@badams
Priority for News Publishers
Indexer
✓ Flawless indexing of articles
Slide 17
Slide 17 text
@badams
Indexing
=
Extraction + Semantics
Slide 18
Slide 18 text
@badams
@badams
Extraction
Can Google easily extract an
article’s content from the DOM?
Slide 19
Slide 19 text
@badams
@badams
Optimise Extraction (1)
• Clean HTML;
➢ Yes, really!
➢ There is a max HTML size
Google will parse
- Speculation: ~1 MB
➢ Less clutter = easier parsing
Slide 20
Slide 20 text
@badams
@badams
Optimise Extraction (2)
• Clean ;
➢ Critical meta tags high in
the
- Title & description
- Open Graph
- Canonical, hreflang & mobile
alternate
- Structured Data
➢ Internal CSS & JS lower in
the
Slide 21
Slide 21 text
@badams
@badams
Optimise Extraction (3)
• Uninterrupted article HTML;
➢ Article to start at
headline and continue in
one clean block of HTML
➢ Bells & whistles can be
added via CSS and client-
side JS
Slide 22
Slide 22 text
@badams
@badams
Semantics
Can Google understand
what the article is about?
Slide 23
Slide 23 text
@badams
@badams
Optimise Semantics
• Well-written content;
➢ Easily identifiable entities and relationships
• Semantic HTML;
➢ Enables Google to separate style & boilerplate from content
• Structured Data;
➢ Makes page contents explicitly clear
Slide 24
Slide 24 text
@badams
Test Entities in Content
Google NLP API: https://cloud.google.com/natural-language
Slide 25
Slide 25 text
@badams
@badams
Core Web Vitals
Slide 26
Slide 26 text
@badams
@badams
Page Experience
Slide 27
Slide 27 text
@badams
@badams
Core Web Vitals
https://web.dev/vitals/
@badams
@badams
Core Web Vitals & AMP
• CWV are measured from the page version a user
interacts with;
➢ This is often the AMP version
• AMP has a performance cheat advantage;
➢ Preloading & prerendering from the AMP Cache
• AMP no longer required for Top Stories on mobile;
➢ Does this mean non-AMP can rank?
@badams
@badams
SEO Review & Monitoring
• Little Warden
https://littlewarden.com/
• SEO Info
https://weeblr.com/doc/products.seoinfo/current/overview/
• SEOBrowse
https://seobrowse.com/