Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crawl like an expert: Bolster your SEO strategy with the right data

Crawl like an expert: Bolster your SEO strategy with the right data

Crawling a website is a good way to collect information about the website and its pages at scale. However, if you’ve ever tried to crawl a website with a specific goal in mind, you've likely discovered that getting the right information for the right pages is not so straightforward. What are the different reasons to crawl a website? What modifications should you make to get the exact information you need?

In this webinar, Rebecca Berbel, Frédéric Gérard and Mickael Serantes will explore how to guide your crawler to do exactly what you want. We’ll examine different scenarios from website monitoring to crawling key pages to accessing protected sites with a crawler (without breaking any rules).

You’ll walk away confident that your next crawls will include the right pages to provide actionable solutions for your technical SEO needs.

Oncrawl

January 30, 2024
Tweet

More Decks by Oncrawl

Other Decks in Marketing & SEO

Transcript

  1. Crawl like an expert:
    Bolster your SEO strategy
    with the right data

    View full-size slide

  2. ● Welcome!
    ● There will be a replay — you'll receive it in a few days.
    ● Feel free to ask questions in the questions tab at the
    bottom of your screen.
    We'll answer them at the end.
    Crawl like an expert: Bolster your SEO
    strategy with the right data

    View full-size slide

  3. Our panel today
    Rebecca Berbel Frédéric Gérard Mickaël Serantes
    Product Marketing
    Manager
    Senior SEO
    Strategist
    Head of Product

    View full-size slide

  4. Rich data and
    cross-analysis
    Excellent scalability Powerful segmentation
    Permanent data availability
    and history
    Support your competitive digital strategy
    and ensure website and brand visibility on search engines.
    Industry-leading Technical SEO Data
    for Competitive Websites
    www.oncrawl.com

    View full-size slide

  5. Crawl like an expert: Bolster your SEO
    strategy with the right data
    ● Why crawling is not one-size-fits-all
    ● Crawling a sitemap
    ● Crawling only part of a site
    ● Different types of crawl when migrating a site
    ● Efficient alerting
    ● Q&A

    View full-size slide

  6. Crawling isn't
    one-size-fits-all

    View full-size slide

  7. How to ensure
    a good crawl strategy
    with Oncrawl?

    View full-size slide

  8. The 3 pillars of
    a consistent
    crawl strategy Goal of the crawl
    Monitor
    changes
    Crawl
    regularly
    Global crawl
    strategy

    View full-size slide

  9. What is a crawl for?
    You can analyze how pages are linked
    together and whether the structure is
    logical and efficient
    It can uncover navigational issues, broken
    links, slow-loading pages, and other
    barriers that might affect the user
    experience
    Crawling your website helps in
    understanding how search engines view
    your site
    Identifying potential security vulnerabilities
    and ensuring compliance with various
    standards and regulations
    User Experience
    Improvement
    Search Engine Optimization
    (SEO):
    Site Structure Analysis:
    Security and Compliance
    Checks

    View full-size slide

  10. Start with a
    new crawl

    View full-size slide

  11. Start with the
    crawl profile
    configuration

    View full-size slide

  12. Start with the
    crawl profile
    configuration

    View full-size slide

  13. Start with the
    crawl profile
    configuration

    View full-size slide

  14. Crawling the URLs
    in a sitemap

    View full-size slide

  15. Why crawl using a sitemap?
    ● Ensuring Search Engine
    Accessibility
    ● You can allow Oncrawl to
    discover sitemaps from a
    directory, subdomain, or URL; or
    you can provide the URLs of one
    or more sitemaps that you want
    to use.

    View full-size slide

  16. Why crawl using a sitemap?
    ● Ensuring Search Engine
    Accessibility
    ● You can allow Oncrawl to
    discover sitemaps from a
    directory, subdomain, or URL; or
    you can provide the URLs of one
    or more sitemaps that you want
    to use.

    View full-size slide

  17. Crawling only
    part of a site

    View full-size slide

  18. Why crawl part of a website
    ● Prioritizing your key pages
    ○ Focus your analysis on what you’re optimizing right now
    ○ Monitor basic but vital information
    ○ Let the crawler explore your key pages (PLPs, PDPs, articles, etc…)
    ○ Extract specific informations from them (Scraping)

    View full-size slide

  19. Why crawl part of a website
    ● Prepare your migration
    ○ Before migration overview
    ○ Follow your migration / optimizations with precision

    View full-size slide

  20. Virtual robots.txt URL Filtering

    View full-size slide

  21. Virtual robots.txt
    ● Override your live robots.txt
    ● Crawl blocked pages
    ● Crawl only some subdomains
    ● Crawl faster than the speed set in the crawl
    delay
    ● Test different rules on your Preprod / Prod
    environment
    ● Easy to implement / correct
    Pros
    URL Filtering

    View full-size slide

  22. Virtual robots.txt
    ● Override your live robots.txt
    ● Crawl blocked pages
    ● Crawl only some subdomains
    ● Crawl faster than the speed set in the crawl
    delay
    ● Test your rules on your Preprod / Prod
    environment
    ● Easy to implement / correct
    ● May not crawl all your pages
    Cons
    Pros
    URL Filtering

    View full-size slide

  23. Virtual robots.txt
    ● Override your live robots.txt
    ● Crawl blocked pages
    ● Crawl only some subdomains
    ● Crawl faster than the speed set in the crawl
    delay
    ● Test your rules on your Preprod / Prod
    environment
    ● Easy to implement / correct
    ● May not crawl all your pages
    Cons
    Pros
    ● Extremely powerful with Regex
    ● Include AND Exclude
    ● Take into account all pages
    ● Respect your live robots.txt
    Pros
    URL Filtering

    View full-size slide

  24. Virtual robots.txt
    ● Override your live robots.txt
    ● Crawl blocked pages
    ● Crawl only some subdomains
    ● Crawl faster than the speed set in the crawl
    delay
    ● Test your rules on your Preprod / Prod
    environment
    ● Easy to implement / correct
    ● May not crawl all your pages
    Cons
    Pros
    ● Extremely powerful with Regex
    ● Include AND Exclude
    ● Take into account all pages
    ● Respect your live robots.txt
    ● Be familiar with Regex
    Cons
    Pros
    URL Filtering

    View full-size slide

  25. 1. Your staging is Disallowed

    View full-size slide

  26. 2. Crawl your PDPs with a virtual robots.txt

    View full-size slide

  27. Homepage
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2

    View full-size slide

  28. Homepage
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Allowed
    Allowed &
    crawled
    Allowed &
    crawled

    View full-size slide

  29. Homepage
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4
    Allowed
    Allowed &
    crawled
    Allowed &
    crawled
    Landing 1
    /category/cooking

    View full-size slide

  30. Homepage
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4
    Allowed
    Allowed &
    crawled
    Allowed &
    crawled
    Disallowed
    Landing 1
    /category/cooking

    View full-size slide

  31. Homepage
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4
    Allowed
    Allowed &
    crawled
    Allowed &
    crawled
    Disallowed
    Allowed but
    not crawled
    Allowed but
    not crawled
    Landing 1
    /category/cooking

    View full-size slide

  32. Homepage
    Landing 1
    /category/cooking
    2. Crawl your PDPs with a virtual robots.txt
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4
    Allowed
    Allowed &
    crawled
    Allowed &
    crawled
    Disallowed
    Allowed but
    not crawled
    Allowed but
    not crawled
    Your crawl won’t be accurate!

    View full-size slide

  33. C0 - Public
    Crawl your product
    pages
    What’s the solution?

    View full-size slide

  34. C0 - Public
    What’s the solution?
    Use URL filtering instead
    Crawl your product
    pages

    View full-size slide

  35. 3. Crawl your PDPs with URL Filtering

    View full-size slide

  36. 3. Crawl your PDPs with URL Filtering
    Homepage
    Landing 1
    /category/cooking
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4

    View full-size slide

  37. 3. Crawl your PDPs with URL Filtering
    Homepage
    Landing 1
    /category/cooking
    Promoted
    Product 1
    /products/my_product1
    Promoted
    Product 2
    /products/my_product2
    Product 3
    /products/my_product3
    Product 4
    /products/my_product4
    explored
    not fetched
    explored
    not fetched
    explored
    & fetched
    explored
    & fetched
    explored
    & fetched
    explored
    & fetched

    View full-size slide

  38. Crawling during a
    migration

    View full-size slide

  39. 1. Before your migration
    ● Full overview crawl or goal-oriented crawl
    ● Different crawl profile for each goal
    ● Crawl your staging environment
    ● Scrape the data you need
    ● List, extract & export useful informations (pages to redirect, pages to
    optimize, links to change, pages without content, off stock product
    pages, empty PLPs, etc…)
    ● Step-by-step migration & crawl

    View full-size slide

  40. 2. During your migration (Preprod)
    ● Crawl your staging environment (even non-accessible to
    search engines)

    View full-size slide

  41. 2. During your migration (Preprod)
    ● Using specific credentials

    View full-size slide

  42. 2. During your migration (Preprod)
    ● Using specific credentials

    View full-size slide

  43. 2. During your migration (Preprod)
    ● Using specific credentials

    View full-size slide

  44. 2. During your migration (Preprod)
    ● To crawl your sites hosted on a different server, like a pre-production
    server

    View full-size slide

  45. What’s new about
    Oncrawl's
    crawl configuration feature?

    View full-size slide

  46. User Agent:
    Crawl with another user agent
    ● Test results with JS pages
    ○ SSR (Server side rendering)
    ○ CSR (Client side rendering)
    ● Follow all the parameters of the txt robot to verify
    that it is properly configured for Google.
    ● Internally: security and crawlability management on
    an IP/Bot pair for certain sites (hidden preprod)

    View full-size slide

  47. User Agent:
    Crawl with another user agent

    View full-size slide

  48. 2. During your migration (Preprod)
    ● Crawl your staging environment (even non-accessible to search engines)
    ● Compare it to your live site & spot differences using crawl comparison:
    ○ Tech SEO

    View full-size slide

  49. 2. During your migration (Preprod)
    ● Crawl your staging environment (even non-accessible to search engines)
    ● Compare it to your live site & spot differences using crawl comparison:
    ○ Tech SEO
    ○ Internal structure
    ○ Internal linking
    ○ Internal popularity

    View full-size slide

  50. 2. During your migration (Preprod)
    ● Crawl your staging environment (even non-accessible to search engines)
    ● Compare it to your live site & spot differences using crawl comparison:
    ○ Tech SEO
    ○ Internal structure
    ○ Internal linking
    ○ Internal popularity
    ○ Content & duplicate content

    View full-size slide

  51. 2. During your migration (Preprod)
    ● Crawl your staging environment (even non-accessible to search engines)
    ● Compare it to your live site & spot differences using crawl comparison:
    ○ Tech SEO
    ○ Internal structure
    ○ Internal linking
    ○ Internal popularity
    ○ Content & duplicate content
    ○ Webperf & Core Web Vitals

    View full-size slide

  52. 3. After your migration
    ● Run a full crawl or focused on part of the site
    ● Check your redirections by crawling your old pages (now redirected in 301)
    ○ Use the URL List crawl mode

    View full-size slide

  53. 3. After your migration
    ● Run a full crawl or focused on part of the site
    ● Check your redirections by crawling your old pages (now redirected in 301)
    ○ Use the URL List crawl mode
    ● Compare your data before and after the migration to spot differences or issues
    (Crawl over Crawl)

    View full-size slide

  54. TIPS
    ● Create a Sitemap for your redirected pages

    View full-size slide

  55. TIPS
    ● Create a Sitemap for your 301 pages
    ● Submit it to Google through GSC
    ○ Faster detection and processing of your redirects

    View full-size slide

  56. TIPS
    ● Create a Sitemap for your 301 pages
    ● Submit it to Google through GSC
    ○ Faster detection and processing of your redirects
    ● Use Sitemap crawl mode to ensure all your pages are redirected to valid pages.

    View full-size slide

  57. 5. Schedule your crawls
    ● Schedule a daily/weekly/monthly crawl to automatically collect fresh data

    View full-size slide

  58. 5. Schedule your crawls
    ● Schedule a daily/weekly/monthly crawl to automatically collect fresh data
    ● Create custom alerts to detect any issue

    View full-size slide

  59. Crawling to drive
    monitoring

    View full-size slide

  60. Efficient alerting
    ● Website monitoring,
    ● Quality assurance,
    ● Business cases with
    custom fields

    View full-size slide

  61. Alerts on specific topics
    ● Pages returning a specific status code
    ● Pages with a duplicate or missing title
    tag
    ● Pages forbidden by robots.txt
    You can also create some about business
    cases using your own custom fields
    ● Scrape your home page
    ● Mandatory elements in your page
    description
    ● Stock verification…

    View full-size slide

  62. To summarize: have a global crawl strategy
    ● Understand the context of your crawl and create a dedicated crawl
    profile
    ● Simplify your daily life by creating schedule crawls for the different crawl
    profiles you use.
    ● Stay informed about changes or issues related to your site by Creating
    alerts

    View full-size slide

  63. Crawling:
    Top takeaways

    View full-size slide

  64. Goal of the crawl
    Monitor
    changes
    Crawl
    regularly
    Global crawl
    strategy
    Crawling: Top takeaways

    View full-size slide

  65. Crawling: Top takeaways
    You don't always have to crawl your full site!
    ● Crawl the pages in your sitemaps
    ● Monitor basic but vital information
    ● Explore your key pages only (PLPs, PDPs, articles, etc…)

    View full-size slide

  66. Crawling: Top takeaways
    You don't always have to crawl your full site!
    ● Crawl the pages in your sitemaps
    ● Monitor basic but vital information
    ● Explore your key pages only (PLPs, PDPs, articles, etc…)
    Use different settings to target different site sections
    ● Robots.txt
    ● URL filtering

    View full-size slide

  67. Crawling: Top takeaways
    You don't always have to crawl your full site!
    ● Crawl the pages in your sitemaps
    ● Monitor basic but vital information
    ● Explore your key pages only (PLPs, PDPs, articles, etc…)
    Use different settings to target different site sections
    ● Robots.txt
    ● URL filtering
    Use different settings and scopes depending on the context
    ● Authentication
    ● User-Agents
    ● List mode (example: lets you check lists of redirects)

    View full-size slide

  68. Crawling: Top takeaways
    You don't always have to crawl your full site!
    ● Crawl the pages in your sitemaps
    ● Monitor basic but vital information
    ● Explore your key pages only (PLPs, PDPs, articles, etc…)
    Use different settings to target different site sections
    ● Robots.txt
    ● URL filtering
    Use different settings and scopes depending on the context
    ● Authentication
    ● User-Agents
    ● List mode (example: lets you check lists of redirects)
    Always monitor changes
    ● Regular crawls
    ● Different teams can be alerted

    View full-size slide

  69. Crawling: Next steps
    Are you:
    Looking at the
    right parts of
    your website?
    Capturing the
    right
    information at
    the right time?
    Taking
    "snapshots"
    frequently
    enough?
    Showing the
    right changes
    to the right
    people?

    View full-size slide

  70. Any questions?
    (ask them in the questions tab)

    View full-size slide

  71. Thank for your attention
    Book your demo
    www.oncrawl.com

    View full-size slide