Webpages that have a high change frequency and/or are seen as highly authoritative News website homepages & key section pages • Main purpose = discovery of valuable new content; ➢ i.e., news articles • Rarely re-crawls newly discovered URLs; ➢ New URLs can become VIPs over time
Pages; URLs that have very little link value and/or are very rarely updated ➢ Recrawls URLs that serve 4XX errors -Likely also occasionally checks old redirects
➢It is then passed on to Regular Crawler ➢Regular Crawler will visit the URL several hours later ➢Any changes made after the first crawl are unlikely to be seen until then ➢By then the story is not news anymore – the news cycle has moved on • Consequence: ➢You usually get one chance to rank in Google’s news ecosystem ➢Get your SEO right before you click ‘Publish’ • Possible Exception: LiveBlogPosting articles
• Parser; ➢ Extracts content from HTML for indexing • Canonicaliser; ➢ Determines a URL’s canonical version • Deduplicator; ➢ Reduces the amount of identical content in the index • Pageranker; ➢ Calculates link value (FMA PageRank) for each URL • Many, many more…
Pages that need to be served quickly and frequently Includes news articles but also popular content 2. SSD storage; ➢ Pages that are regularly served in SERPs but aren’t super popular 3. HDD storage; ➢ Pages that are rarely (if ever) served in SERPs
all your critical content in the HTML source Don’t rely on rendering to load valuable content 2. There’s no such thing as a duplicate content penalty; However, duplicate content on a single site means the site is competing with itself… and that’s stupid.
Response Time ➢ Clean URLs - Never use tracking parameters on internal links https://www.website.com/news/article-123?recommended=1 https://www.website.com/news/article-123
HTML ➢ <h1> headlines ➢ Clean HTML in <head> ➢ Uninterrupted HTML in article <body> ➢ Good structured data; - NewsArticle for articles - Person for author pages - Keep it lean, don’t over-annotate
probability of ranking B; ➢Massively complicated systems ➢Intensely competitive web ➢All SEO is geared towards maximising probabilities; - But… 99% probability still means 1% chance of it not happening
(good) articles on a topic determines your topic authority • Topic authority = good visibility for your stories • Deleting old content could undermine your topic authority • Only delete bad content; ➢ Age and low traffic are not enough