Thousands of 404 Pages Indexed - Recommendations?
-
Background: I have a newly acquired client who has had a lot of issues over the past few months.
What happened is he had a major issue with broken dynamic URL's where they would start infinite loops due to redirects and relative links. His previous SEO didn't pay attention to the sitemaps created by a backend generator, and it caused hundreds of thousands of pages to be indexed. Useless pages.
These useless pages were all bringing up a 404 page that didn't have a 404 server response (it had a 200 response) which created a ton of duplicate content and bad links (relative linking).
Now here I am, cleaning up this mess. I've fixed the 404 page so it creates a 404 server response. Google webmaster tools is now returning thousands of "not found" errors, great start. I fixed all site errors that cause infinite redirects. Cleaned up the sitemap and submitted it.
When I search site:www.(domainname).com I am still getting an insane amount of pages that no longer exist.
My question: How does Google handle all of these 404's? My client wants all the bad pages removed now but I don't have as much control over that. It's a slow process getting Google to remove these pages that are returning a 404. He is continuously dropping in rankings still.
Is there a way of speeding up the process? It's not reasonable to enter tens of thousands of pages into the URL Removal Tool.
I want to clean house and have Google just index the pages in the sitemap.
-
yeah all of the 301's are done - but I am trying to get around submitting tens of thousands of URL's to the URL removal tool.
-
Make sure you pay special attention to implementing the correct rel canonical was first introduced we wanted to be a little careful. We didn’t want to open it up for potential abuse so you could only use rel canonical within one domain. The only exception to that was you could do between IP addresses and domains.
But over time we didn’t see people abusing it a lot and if you think about it, if some evil malicious hacker has hacked your website and he’s going to do something to you he’s probably going to put some malware on the page or do a 301 redirect. He’s probably not patient enough to add a rel canonical and then wait for it to be re-crawled and re-indexed and all that sort of stuff.
So we sort of saw that there didn’t seem to be a lot of abuse. Most webmasters use rel canonical in really smart ways. We didn’t see a lot of people accidentally shooting themselves in the foot, which is something we do have to worry about and so a little while after rel canonical was introduced we added the ability to do cross domain rel canonical.
It basically works essentially like a 301 redirect. If you can do a 301 redirect that is still preferred because every search engine knows how to handle those and new search engines will know how to process 301s and permanent redirects.
But we do take a rel canonical and if it’s on one domain and points to another domain we will typically honor that. We always reserve the right to sort of hold back if we think that the webmaster is doing something wrong or making a mistake but in general we will almost always abide by that.
Hope that helps.
I had I have a client who unfortunately had a dispute with her prior IT person and the person made a mess of the site. It is not the quickest thing and I do agree 301 redirects are by far the quickest way to go about it. If you're getting 404 errors and the site is passing link juice. You're going to want to redirect those scattered about the website to the most relevant page.
http://jamesmartell.com/matt-cutts/how-does-google-handle-not-found-pages-that-do-not-return-a-404/
http://www.seroundtable.com/404-links-google-15427.html
http://support.google.com/customsearch/bin/topic.py?hl=en&topic=11493&parent=1723950&ctx=topic
https://developers.google.com/custom-search/docs/indexing
https://developers.google.com/custom-search/docs/api
I hope I was of help to you,
Thomas
-
Have you redirected (301) to appropriate landing pages ? After redirection, use URL removal tool. Its work great for me, its shows the result in 24 hours to me. Its removes all the URLs from Google index that I have submitted into it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google suddenly indexing 1,000 fewer pages. Why?
We have a site, blog.example.org, and another site, www.example.org. The most visited pages on www.example.org were redesigned; the redesign landed May 8. I would expect this change to have some effect on organic rank and conversions. But what I see is surprising; I can't believe it's related, but I mention this just in case. Between April 30 and May 7, Google stopped indexing roughly 1,000 pages on www.example.org, and roughly 3,000 pages on blog.example.org. In both cases the number of pages that fell out of the index represents appx. 15% of the overall number of pages. What would cause Google to suddenly stop indexing thousands of pages on two different subdomains? I'm just looking for ideas to dig into; no suggestion would be too basic. FWIW, the site is localized into dozens of languages.
Intermediate & Advanced SEO | | hoosteeno0 -
Why do I have so many extra indexed pages?
Stats- Webmaster Tools Indexed Pages- 96,995 Site: Search- 97,800 Pages Sitemap Submitted- 18,832 Sitemap Indexed- 9,746 I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
Intermediate & Advanced SEO | | Tylerj0 -
How I can improve my website On page and Off page
My Website is guitarcontrol.com, I have very strong competition in market. Please advice me the list of improvements on my websites. In regarding ON page, Linkbuiding and Social media. What I can do to improve my website ranking?
Intermediate & Advanced SEO | | zoe.wilson170 -
Site migration - 301 or 404 for pages no longer needed?
Hi I am migrating from my old website to a new one on a different, server with a very different domain and url structure. I know it's is best to change as little as possible but I just wasn't able to do that. Many of my pages can be redirected to new urls with similar or the same content. My old site has around 400 pages. Many of these pages/urls are no longer required on the new site - should I 404 these pages or 301 them to the homepage? I have looked through a lot of info online to work this out but cant seem to find a definative answer. Thanks for this!! James
Intermediate & Advanced SEO | | Curran0 -
Using two 404 NOT FOUND pages
Hi all, I was wondering if any of you can advise whether it's no issue to use two separate custom 404 pages. The 404 pages would be different for different parts of the site. For instance, if you're on /community/ and you enter a non-existing page on: www.sample.com/community/example/ it would give you a different 404 page than someone who runs into a non existing page at: www.sample.com/definition/example/ Does anybody have experience with this and would this be fine?
Intermediate & Advanced SEO | | RonFav0 -
Why my own page is not indexed for that keyword?
hi, I recently recreated the page www.zenucchi.it /ITA/poltrona-frau-brescia.html on the third level domain poltronafraubrescia.zenucchi.it by putting it on the home page. The first page is still indexed for the keyword poltrona frau brescia . But the new page is no indexed for that keyword and i don't know why ( even if the page is indexed in google ) .. I state that the new domain has the same autorithy and that i put a 301 redirect to pass his authority to the new one that has many more incoming links that did not have previous .. i hope you'll help me thanks a lot
Intermediate & Advanced SEO | | guidoboem0 -
Category Pages up - Product Pages down... what would help?
Hi I mentioned yesterday how one of our sites was losing rank on product pages. What steps do you take to improve the SERPS of product pages, in this case home/category/product is the tree. There isn't really any internal linking, except one link from the category page to each product, would setting up a host of internal links perhaps "similar products" linking them together be a place to start? How can I improve my ranking of these more deeply internal pages? Not just internal links?
Intermediate & Advanced SEO | | xoffie0 -
Deep Page is Ranking for Main Keyword, But I Want the Home Page to Rank
A deep page is ranking for a competitive and essential keyword, I'd like the home page to rank. The main reasons are probably: This specific page is optimized for just that keyword. Contains keyword in URL I've optimized the home page for this keyword as much as possible without sacrificing the integrity of the home page and the other keywords I need to maintain. My main question is: If I use a 301 redirect on this deep page to the home page, am I risking my current ranking, or will my home page replace it on the SERPs? Thanks so much in advance!
Intermediate & Advanced SEO | | ClarityVentures0