Why is Google Webmaster Tools reporting a massive increase in 404s?

sonetseo

Several weeks back, we launched a new website, replacing a legacy system moving it to a new server. With the site transition, webroke some of the old URLs, but it didn't seem to be too much concern. We blocked ones I knew should be blocked in robots.txt, 301 redirected as much duplicate data and used canonical tags as far as I could (which is still an ongoing process), and simply returned 404 for any others that should have never really been there.

For the last months, I've been monitoring the 404s Google reports in Web Master Tootls (WMT) and while we had a few hundred due to the gradual removal duplicate data, I wasn't too concerned. I've been generating updated sitemaps for Google multiple times a week with any updated URLs. Then WMT started to report a massive increase in 404s, somewhere around 25,000 404s per day (making it impossible for me to keep up). The sitemap.xml has new URL only but it seems that Google still uses the old sitemap from before the launch. The reported sources of 404s (in WMT) don't exist anylonger. They all are coming from the old site.

I attached a screenshot showing the drastic increase in 404s. What could possibly cause this problem?

wmt-massive-404s.png

sonetseo

Thank you for both responses...

Nakul--

I have been following everything exactly as you have described. In general the goal during the development was to keep changes to an absolute minimum. This has not always been possible.

The majority of external links have been 301 redirected or in cases where the new server responds to two differnet URLs for the same content a canonical tag has been added.

I have noticed that 99% of the reported URLs are former internal links. The reported 404s are completely out of proportion (194k vs less than 5k pages in the new xml sitemap).

I am really worried. Is there anything else I can do beside monitoring and hopping?

How long does it typically take to for "Things have to work their way out of its system."?

Is it possible that Google is somehow accessing the old IP address (although the DNS records for the domain have changed)? We left the old server alive and planning to shut it down after the second site has been moved away from it.

Thanks,

Adam

josh-riley

Agreed; it could an after effect and stems from inbound URLs to your site from other sites. That's what the majority of the 404s I see in GWT come from (vs being bad pages within my site).

Google probably isn't using the old sitemap if you gave them a new one. What could be happening is that it still needs to "reorganize" and reconcile your old URLs and new URLs. The indexed pages don't just disappear overnight or get replaced immediately because of a site map change. Things have to work their way out of its system.

If there's specific URLs you want to try to remedy immediately, look into the GWT Remove URL option under the optimization section.

NakulGoyal

What I'd suggest doing is randomly revising some of those 404's that appear and check whether they should indeed be 404s. Are there any bulk rules / wildcard 301s you can implement to redirect the traffic for 3-6 months ?

These URLs are usually found from external links to your website. When you click on a detail of any of the reported 404s, it tells you what the error details are, whether this link is in the sitemap or where it is linked from. You'd realize in most cases it's linked from somewhere. If it's an internal link, correct it. If it's external, do you think the webmaster might update it if you contact them or is it easier to just set a 301, retaining the SEO value ?

I hope this helps.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why is Google Webmaster Tools reporting a massive increase in 404s?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Pagination Changes

Massive Amount of Pages Deindexed

Did Google Ignore My Links?

Google news and Yoast News

Google Search Results...

Should We Remove Content Through Google Webmaster Tools?

How to leverage Google Images?

How to remove an entire subdomain from the Google index with URL removal tool?