Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crawling, Indexation and Logfiles

Crawling, Indexation and Logfiles

Andor Palau

October 04, 2024
Tweet

Other Decks in Marketing & SEO

Transcript

  1. „And then, if you think about it, one thing that

    we do and we might not need to do that much is refresh crawls.“ 20. Janauar 2022 5 https://www.seroundtable.com/google-crawling-more-efficient-environmental-friendly-32792.html
  2. „And often, we can't estimate this well, and we definitely

    have room for improvement there on refresh crawls, because sometimes, it just seems wasteful that we are hitting the same URL over and over again.“ 20. Janauar 2022 6 https://www.seroundtable.com/google-crawling-more-efficient-environmental-friendly-32792.html
  3. @andorpalau #brightonseo 9 https://www.contentkingapp.com/academy/crawl-budget/ Why crawl budget should be optimised

    1. The primary goal: Getting changes to the content being processed as quickly as possible after they have been made. 2. Newly published content is crawled quickly and subsequently indexed. 3. Best possible management of resources such as CSS, JavaScript, or images is also an important goal of crawl budget optimization."
  4. @andorpalau #brightonseo 11 .pdf, .ps, .csv, .kml, .kmz, .gpx, .hwp,

    .htm, .html, .xls, .xlsx, .ppt, .pptx, .doc, .docx, .odp, .ods, .odt, .rtf, .svg, .tex, .txt, .text, .bas, .c, .cc, .cpp, .cxx, .h, .hpp, .cs, .java, .pl, .py, .wml, .wap, .xml, .bmp, .gif, .jpeg, .png, .webp, .svg, .3gp, .3g2, .asf, .avi, .divx, .m2v, .m3u, .m3u8, .m4v, .mkv, .mov, .mp4, .mpeg, .ogv, .qvt, .ram, .rm, .vob, .webm, .wmv, and .xap https://developers.google.com/search/docs/crawling-indexing/indexable-file-types?hl=en These are files that can be indexed by Google
  5. "After crawling, Google can already decide that the URL does

    not run through the processing if the HTML is bad, and the problem is not that Google renders JS slowly, but that the HTML is bad."
  6. @andorpalau #brightonseo 23 Summary: What is a log file? Servers

    and computer applications of all kinds usually automatically generate a so-called log entry when they perform an action. This is written away in a file. If a bot crawls a URL, this creates an entry in the log file. We analyse these entries afterwards to understand how the bot moved around the domain. www.oncrawl.com:80 66.249.73.145 - - [07/Feb/2018:17:06:04 +0000] "GET /blog/ HTTP/1.1" 200 14486 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" Important information from the log file: • Host • IP-Adresse • User Agent • Datum • URL • Statuscode
  7. @andorpalau #brightonseo 29 Oncrawl / *last 45 days Only 35%

    are linked & 14% of these generate clicks
  8. @andorpalau #brightonseo 48 • Crawling & logfile analyses are still

    extremely exciting and helpful in 2024 for bigger site. • Combine as much data as possible to be able to compare them with each other. • Segment your data: Directories, URL types, authors, temporal aspects, etc. This allows you to gain much more insight from your data. • Look especially for inefficiencies: Don't just let static recommendations pile up on you. Think about what would and wouldn't make sense in your context and check this against the data. Some Key Take-Aways