Google Cache Crawl 404 Errors

// // March 13th 2009 // Rant + SEO

When is an error not really an error?

The other day Google Webmaster Tools informed one of my clients that it had found over 1,000 404 errors. Numerous Google folks (including Maile Ohye) have told me that an excess of 404s will adversely impact SEO.

Supporting this thesis and to help in tracking down these renegade links, Google has relatively new functionality that tells you what pages a 404 was linked from. Thank you Google.

Google Logo Cache

Very quickly I realized that all of the ‘linked from’ pages were cached pages. Pages with a discovery date of between four to six months ago. Internal pages. Pages that have since changed. In fact, pages that no longer have a link to the dead page.

No thank you Google.

Clearly it would have been nice if this client had 301 redirected all of these URLs. But when doing a major architecture change you’re often going to orphan a number of URLs. It happens. And if you’ve retired the links internally, and no external links existed, the pages essentially disappear.

Unless you’re crawling an out-of-date copy of the page.

Of course you can request a URL removal via Google Webmaster Tools. But am I really going to do this for 1,000 pages? It’s painful even if I can narrow it down using a directory or subdirectory.

Instead I can implement 301 redirects for the offending URLs. All for the sole purpose of ensuring that a cached crawl of internal pages doesn’t trip a 404.

Both of these options seem unnecessary.

If Google finds a 404 in a cached page why wouldn’t they seek out the original to verify that the problem currently exists? It seems like an easy business rule to implement and would likely reduce the volume of URL removal requests.

Is it that easy or am I missing something?

Postscript: Leave A Comment // Subscribe (RSS Feed)

The Next Post:
The Previous Post:

Comments About Google Cache Crawl 404 Errors

// 1 comments so far.

  1. Jeff Swanson // April 06th 2009

    I agree with you completely. I have a messy situation similar to what you described above. Even worse, believe it or not. Webmaster Tools gives some great insight, but the cache situation is killing me. However, even without the cache, I still have some difficult work ahead.

    Please keep me updated with what action you take. I’m guessing I’m going to lose a lot of external links, but there’s not much more that can be done. I’m going to slowly remove links from internal pages and see where that gets me first. Then, I guess I’ll have to start requesting removal from the index. Not sure how else it can be done.

Who Are You?

Your Email Address

Your Website

You can follow any responses to this entry via its RSS comments feed. You may also leave a trackback by clicking this link.