Infinite Loops, Dirty Architecture and Crawl Rank

INFINITE LOOPS & crawl rank DIRTY ARCHITECTURE Dawn Anderson

CAME INDUSTRY VIA A DIFFERENT ROUTE THIS to

I decided to add an additional dimension to the site
TO ‘EXPLODE’ NATURAL SEARCH TRAFFIC

1.5 Million URLs

Crawl Rate Going Down Indexation Levels Going Up

GOOGLE Only crawling 0.1% Of our pages per day

Infinite Loop Definition: An infinite loop is a sequence of
instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over. ..

PENGUIN & PANDA updates came along

TOO MANY URLS =SEO DEATH ‘WE’RE ALL ‘DOOMED’’

Budget CRAWL Roughly proportionate to PageRank Pages with a lot
of links get crawled more Still applies in current search landscape

Rank CRAWL A ranking metric for ‘no’ to ‘low’ PageRank
pages?? Pages crawled more often rank higher Get ‘low’ to ‘no’ PageRank pages crawled more than competitors = YOU WIN

CRAWL OPTIMISATION Googlebot goes AND KEEP WATCHING FIND OUT WHERE

CHECK & MONITOR for over-indexation 500 Page Website 50,00 URLs
in Google YOU MAY HAVE DODGY CODE

Shoes.sitemap.xml Dresses.sitemap.xml tshirts.sitemap.xml Check THOROUGHLY, Name & Categorise XML Sitemaps
yoursite.sitemap.xml

DON’T BE AFRAID of hard 404’s Use 410’s where you
can Giraffe AVOID soft 404’s

ENSURE THAT Dynamic variables / parameters are checked for validation
Don’t render to just any old thing with a ‘200 OK’ response code or return a soft 404 HOW WILL YOU KNOW IF THERE’S A PROBLEM? You won’t

AVOID A ‘JUMBLE SALE’ BUT

Use Robots.txt, nofollows, sitemaps, nav paths & cross module internal
linking ‘Herd’ Googlebot

Get Those Low Level Pages Crawled - Often Whichever way
you can Pass equity to Siblings as Well as children

Visit the internal links section on GWT Most Important Page
1 Most Important Page 2 Most Important Page 3 IS THIS YOUR BLOG?? HOPE NOT

CANONICALISATION In web search and search engine optimization (SEO), URL
canonicalization deals with web content that has more than one possible URL. Having multiple URLs for the same web content can cause problems for search engines - specifically in determining which URL should be shown in search results.[2] Example: •http://wikipedia.com •http://www.wikipedia.com •http://www.wikipedia.com/ •http://www.wikipedia.com/?source=asdf All of these URLs point to the homepage of Wikipedia, but a search engine will only consider one of them to be the canonical form of the URL.(source - Wikipedia)

Deal Well With Near & near duplicate content Via canonicalization,
301’s & Content Build Out

STOP LYING & ‘GET FRESH’ Genuine ‘last modified dates’ are
ALL important - FORGET PRIORITY

"It's not that Google will penalize you, it's the opportunity
cost for dirty architecture based on a finite crawl budget" (A.J.Kohn) (BLIND FIVE YEAR OLD) REMEMBER THIS

Me @dawnieando

Infinite Loops, Dirty Architecture and Crawl Rank

Infinite Loops, Dirty Architecture and Crawl Rank

Dawn Anderson

More Decks by Dawn Anderson

Other Decks in Marketing & SEO

Featured

Transcript

INFINITE LOOPS & crawl rank DIRTY ARCHITECTURE Dawn Anderson

CAME INDUSTRY VIA A DIFFERENT ROUTE THIS to

I decided to add an additional dimension to the site

1.5 Million URLs

Crawl Rate Going Down Indexation Levels Going Up

GOOGLE Only crawling 0.1% Of our pages per day

Infinite Loop Definition: An infinite loop is a sequence of

PENGUIN & PANDA updates came along

TOO MANY URLS =SEO DEATH ‘WE’RE ALL ‘DOOMED’’

Budget CRAWL Roughly proportionate to PageRank Pages with a lot

Rank CRAWL A ranking metric for ‘no’ to ‘low’ PageRank

CRAWL OPTIMISATION Googlebot goes AND KEEP WATCHING FIND OUT WHERE

CHECK & MONITOR for over-indexation 500 Page Website 50,00 URLs

Shoes.sitemap.xml Dresses.sitemap.xml tshirts.sitemap.xml Check THOROUGHLY, Name & Categorise XML Sitemaps

DON’T BE AFRAID of hard 404’s Use 410’s where you

ENSURE THAT Dynamic variables / parameters are checked for validation

AVOID A ‘JUMBLE SALE’ BUT

Use Robots.txt, nofollows, sitemaps, nav paths & cross module internal

Get Those Low Level Pages Crawled - Often Whichever way

Visit the internal links section on GWT Most Important Page

CANONICALISATION In web search and search engine optimization (SEO), URL

Deal Well With Near & near duplicate content Via canonicalization,

STOP LYING & ‘GET FRESH’ Genuine ‘last modified dates’ are

"It's not that Google will penalize you, it's the opportunity

Me @dawnieando