Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crawl like an expert: Bolster your SEO strategy with the right data

Crawl like an expert: Bolster your SEO strategy with the right data

Crawling a website is a good way to collect information about the website and its pages at scale. However, if you’ve ever tried to crawl a website with a specific goal in mind, you've likely discovered that getting the right information for the right pages is not so straightforward. What are the different reasons to crawl a website? What modifications should you make to get the exact information you need?

In this webinar, Rebecca Berbel, Frédéric Gérard and Mickael Serantes will explore how to guide your crawler to do exactly what you want. We’ll examine different scenarios from website monitoring to crawling key pages to accessing protected sites with a crawler (without breaking any rules).

You’ll walk away confident that your next crawls will include the right pages to provide actionable solutions for your technical SEO needs.

Oncrawl

January 30, 2024
Tweet

More Decks by Oncrawl

Other Decks in Marketing & SEO

Transcript

  1. • Welcome! • There will be a replay — you'll

    receive it in a few days. • Feel free to ask questions in the questions tab at the bottom of your screen. We'll answer them at the end. Crawl like an expert: Bolster your SEO strategy with the right data
  2. Our panel today Rebecca Berbel Frédéric Gérard Mickaël Serantes Product

    Marketing Manager Senior SEO Strategist Head of Product
  3. Rich data and cross-analysis Excellent scalability Powerful segmentation Permanent data

    availability and history Support your competitive digital strategy and ensure website and brand visibility on search engines. Industry-leading Technical SEO Data for Competitive Websites www.oncrawl.com
  4. Crawl like an expert: Bolster your SEO strategy with the

    right data • Why crawling is not one-size-fits-all • Crawling a sitemap • Crawling only part of a site • Different types of crawl when migrating a site • Efficient alerting • Q&A
  5. The 3 pillars of a consistent crawl strategy Goal of

    the crawl Monitor changes Crawl regularly Global crawl strategy
  6. What is a crawl for? You can analyze how pages

    are linked together and whether the structure is logical and efficient It can uncover navigational issues, broken links, slow-loading pages, and other barriers that might affect the user experience Crawling your website helps in understanding how search engines view your site Identifying potential security vulnerabilities and ensuring compliance with various standards and regulations User Experience Improvement Search Engine Optimization (SEO): Site Structure Analysis: Security and Compliance Checks
  7. Why crawl using a sitemap? • Ensuring Search Engine Accessibility

    • You can allow Oncrawl to discover sitemaps from a directory, subdomain, or URL; or you can provide the URLs of one or more sitemaps that you want to use.
  8. Why crawl using a sitemap? • Ensuring Search Engine Accessibility

    • You can allow Oncrawl to discover sitemaps from a directory, subdomain, or URL; or you can provide the URLs of one or more sitemaps that you want to use.
  9. Why crawl part of a website • Prioritizing your key

    pages ◦ Focus your analysis on what you’re optimizing right now ◦ Monitor basic but vital information ◦ Let the crawler explore your key pages (PLPs, PDPs, articles, etc…) ◦ Extract specific informations from them (Scraping)
  10. Why crawl part of a website • Prepare your migration

    ◦ Before migration overview ◦ Follow your migration / optimizations with precision
  11. Virtual robots.txt • Override your live robots.txt • Crawl blocked

    pages • Crawl only some subdomains • Crawl faster than the speed set in the crawl delay • Test different rules on your Preprod / Prod environment • Easy to implement / correct Pros URL Filtering
  12. Virtual robots.txt • Override your live robots.txt • Crawl blocked

    pages • Crawl only some subdomains • Crawl faster than the speed set in the crawl delay • Test your rules on your Preprod / Prod environment • Easy to implement / correct • May not crawl all your pages Cons Pros URL Filtering
  13. Virtual robots.txt • Override your live robots.txt • Crawl blocked

    pages • Crawl only some subdomains • Crawl faster than the speed set in the crawl delay • Test your rules on your Preprod / Prod environment • Easy to implement / correct • May not crawl all your pages Cons Pros • Extremely powerful with Regex • Include AND Exclude • Take into account all pages • Respect your live robots.txt Pros URL Filtering
  14. Virtual robots.txt • Override your live robots.txt • Crawl blocked

    pages • Crawl only some subdomains • Crawl faster than the speed set in the crawl delay • Test your rules on your Preprod / Prod environment • Easy to implement / correct • May not crawl all your pages Cons Pros • Extremely powerful with Regex • Include AND Exclude • Take into account all pages • Respect your live robots.txt • Be familiar with Regex Cons Pros URL Filtering
  15. Homepage 2. Crawl your PDPs with a virtual robots.txt Promoted

    Product 1 /products/my_product1 Promoted Product 2 /products/my_product2
  16. Homepage 2. Crawl your PDPs with a virtual robots.txt Promoted

    Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Allowed Allowed & crawled Allowed & crawled
  17. Homepage 2. Crawl your PDPs with a virtual robots.txt Promoted

    Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4 Allowed Allowed & crawled Allowed & crawled Landing 1 /category/cooking
  18. Homepage 2. Crawl your PDPs with a virtual robots.txt Promoted

    Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4 Allowed Allowed & crawled Allowed & crawled Disallowed Landing 1 /category/cooking
  19. Homepage 2. Crawl your PDPs with a virtual robots.txt Promoted

    Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4 Allowed Allowed & crawled Allowed & crawled Disallowed Allowed but not crawled Allowed but not crawled Landing 1 /category/cooking
  20. Homepage Landing 1 /category/cooking 2. Crawl your PDPs with a

    virtual robots.txt Promoted Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4 Allowed Allowed & crawled Allowed & crawled Disallowed Allowed but not crawled Allowed but not crawled Your crawl won’t be accurate!
  21. 3. Crawl your PDPs with URL Filtering Homepage Landing 1

    /category/cooking Promoted Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4
  22. 3. Crawl your PDPs with URL Filtering Homepage Landing 1

    /category/cooking Promoted Product 1 /products/my_product1 Promoted Product 2 /products/my_product2 Product 3 /products/my_product3 Product 4 /products/my_product4 explored not fetched explored not fetched explored & fetched explored & fetched explored & fetched explored & fetched
  23. 1. Before your migration • Full overview crawl or goal-oriented

    crawl • Different crawl profile for each goal • Crawl your staging environment • Scrape the data you need • List, extract & export useful informations (pages to redirect, pages to optimize, links to change, pages without content, off stock product pages, empty PLPs, etc…) • Step-by-step migration & crawl
  24. 2. During your migration (Preprod) • To crawl your sites

    hosted on a different server, like a pre-production server
  25. User Agent: Crawl with another user agent • Test results

    with JS pages ◦ SSR (Server side rendering) ◦ CSR (Client side rendering) • Follow all the parameters of the txt robot to verify that it is properly configured for Google. • Internally: security and crawlability management on an IP/Bot pair for certain sites (hidden preprod)
  26. 2. During your migration (Preprod) • Crawl your staging environment

    (even non-accessible to search engines) • Compare it to your live site & spot differences using crawl comparison: ◦ Tech SEO
  27. 2. During your migration (Preprod) • Crawl your staging environment

    (even non-accessible to search engines) • Compare it to your live site & spot differences using crawl comparison: ◦ Tech SEO ◦ Internal structure ◦ Internal linking ◦ Internal popularity
  28. 2. During your migration (Preprod) • Crawl your staging environment

    (even non-accessible to search engines) • Compare it to your live site & spot differences using crawl comparison: ◦ Tech SEO ◦ Internal structure ◦ Internal linking ◦ Internal popularity ◦ Content & duplicate content
  29. 2. During your migration (Preprod) • Crawl your staging environment

    (even non-accessible to search engines) • Compare it to your live site & spot differences using crawl comparison: ◦ Tech SEO ◦ Internal structure ◦ Internal linking ◦ Internal popularity ◦ Content & duplicate content ◦ Webperf & Core Web Vitals
  30. 3. After your migration • Run a full crawl or

    focused on part of the site • Check your redirections by crawling your old pages (now redirected in 301) ◦ Use the URL List crawl mode
  31. 3. After your migration • Run a full crawl or

    focused on part of the site • Check your redirections by crawling your old pages (now redirected in 301) ◦ Use the URL List crawl mode • Compare your data before and after the migration to spot differences or issues (Crawl over Crawl)
  32. TIPS • Create a Sitemap for your 301 pages •

    Submit it to Google through GSC ◦ Faster detection and processing of your redirects
  33. TIPS • Create a Sitemap for your 301 pages •

    Submit it to Google through GSC ◦ Faster detection and processing of your redirects • Use Sitemap crawl mode to ensure all your pages are redirected to valid pages.
  34. 5. Schedule your crawls • Schedule a daily/weekly/monthly crawl to

    automatically collect fresh data • Create custom alerts to detect any issue
  35. Alerts on specific topics • Pages returning a specific status

    code • Pages with a duplicate or missing title tag • Pages forbidden by robots.txt You can also create some about business cases using your own custom fields • Scrape your home page • Mandatory elements in your page description • Stock verification…
  36. To summarize: have a global crawl strategy • Understand the

    context of your crawl and create a dedicated crawl profile • Simplify your daily life by creating schedule crawls for the different crawl profiles you use. • Stay informed about changes or issues related to your site by Creating alerts
  37. Crawling: Top takeaways You don't always have to crawl your

    full site! • Crawl the pages in your sitemaps • Monitor basic but vital information • Explore your key pages only (PLPs, PDPs, articles, etc…)
  38. Crawling: Top takeaways You don't always have to crawl your

    full site! • Crawl the pages in your sitemaps • Monitor basic but vital information • Explore your key pages only (PLPs, PDPs, articles, etc…) Use different settings to target different site sections • Robots.txt • URL filtering
  39. Crawling: Top takeaways You don't always have to crawl your

    full site! • Crawl the pages in your sitemaps • Monitor basic but vital information • Explore your key pages only (PLPs, PDPs, articles, etc…) Use different settings to target different site sections • Robots.txt • URL filtering Use different settings and scopes depending on the context • Authentication • User-Agents • List mode (example: lets you check lists of redirects)
  40. Crawling: Top takeaways You don't always have to crawl your

    full site! • Crawl the pages in your sitemaps • Monitor basic but vital information • Explore your key pages only (PLPs, PDPs, articles, etc…) Use different settings to target different site sections • Robots.txt • URL filtering Use different settings and scopes depending on the context • Authentication • User-Agents • List mode (example: lets you check lists of redirects) Always monitor changes • Regular crawls • Different teams can be alerted
  41. Crawling: Next steps Are you: Looking at the right parts

    of your website? Capturing the right information at the right time? Taking "snapshots" frequently enough? Showing the right changes to the right people?