Thousands of 404 Pages Indexed - Recommendations?
-
Background: I have a newly acquired client who has had a lot of issues over the past few months.
What happened is he had a major issue with broken dynamic URL's where they would start infinite loops due to redirects and relative links. His previous SEO didn't pay attention to the sitemaps created by a backend generator, and it caused hundreds of thousands of pages to be indexed. Useless pages.
These useless pages were all bringing up a 404 page that didn't have a 404 server response (it had a 200 response) which created a ton of duplicate content and bad links (relative linking).
Now here I am, cleaning up this mess. I've fixed the 404 page so it creates a 404 server response. Google webmaster tools is now returning thousands of "not found" errors, great start. I fixed all site errors that cause infinite redirects. Cleaned up the sitemap and submitted it.
When I search site:www.(domainname).com I am still getting an insane amount of pages that no longer exist.
My question: How does Google handle all of these 404's? My client wants all the bad pages removed now but I don't have as much control over that. It's a slow process getting Google to remove these pages that are returning a 404. He is continuously dropping in rankings still.
Is there a way of speeding up the process? It's not reasonable to enter tens of thousands of pages into the URL Removal Tool.
I want to clean house and have Google just index the pages in the sitemap.
-
yeah all of the 301's are done - but I am trying to get around submitting tens of thousands of URL's to the URL removal tool.
-
Make sure you pay special attention to implementing the correct rel canonical was first introduced we wanted to be a little careful. We didn’t want to open it up for potential abuse so you could only use rel canonical within one domain. The only exception to that was you could do between IP addresses and domains.
But over time we didn’t see people abusing it a lot and if you think about it, if some evil malicious hacker has hacked your website and he’s going to do something to you he’s probably going to put some malware on the page or do a 301 redirect. He’s probably not patient enough to add a rel canonical and then wait for it to be re-crawled and re-indexed and all that sort of stuff.
So we sort of saw that there didn’t seem to be a lot of abuse. Most webmasters use rel canonical in really smart ways. We didn’t see a lot of people accidentally shooting themselves in the foot, which is something we do have to worry about and so a little while after rel canonical was introduced we added the ability to do cross domain rel canonical.
It basically works essentially like a 301 redirect. If you can do a 301 redirect that is still preferred because every search engine knows how to handle those and new search engines will know how to process 301s and permanent redirects.
But we do take a rel canonical and if it’s on one domain and points to another domain we will typically honor that. We always reserve the right to sort of hold back if we think that the webmaster is doing something wrong or making a mistake but in general we will almost always abide by that.
Hope that helps.
I had I have a client who unfortunately had a dispute with her prior IT person and the person made a mess of the site. It is not the quickest thing and I do agree 301 redirects are by far the quickest way to go about it. If you're getting 404 errors and the site is passing link juice. You're going to want to redirect those scattered about the website to the most relevant page.
http://jamesmartell.com/matt-cutts/how-does-google-handle-not-found-pages-that-do-not-return-a-404/
http://www.seroundtable.com/404-links-google-15427.html
http://support.google.com/customsearch/bin/topic.py?hl=en&topic=11493&parent=1723950&ctx=topic
https://developers.google.com/custom-search/docs/indexing
https://developers.google.com/custom-search/docs/api
I hope I was of help to you,
Thomas
-
Have you redirected (301) to appropriate landing pages ? After redirection, use URL removal tool. Its work great for me, its shows the result in 24 hours to me. Its removes all the URLs from Google index that I have submitted into it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does redirecting a duplicate page NOT in Google‘s index pass link juice? (External links not showing in search console)
Hello! We have a powerful page that has been selected by Google as a duplicate page of another page on the site. The duplicate is not indexed by Google, and the referring domains pointing towards that page aren’t recognized by Google in the search console (when looking at the links report). My question is - if we 301 redirect the duplicate page towards the one that Google has selected as canonical, will the link juice be passed to the new page? Thanks!
Intermediate & Advanced SEO | | Lewald10 -
Shopify Website Page Indexing issue
Hi, I am working on an eCommerce website on Shopify.
Intermediate & Advanced SEO | | Bhisshaun
When I tried Indexing my newly created service pages. The pages are not getting indexed on Google.
I also tried manual indexing of each page and submitted a sitemap but still, the issue doesn't seem to be resolved. Thanks0 -
Print pages returning 404's
Print pages on one of our sister sites are returning 404's in our crawl but are visible when clicked on. Here is one example: https://www.theelementsofliving.com/recipe/citrus-energy-boosting-smoothie/print Any ideas as to why these are returning errors? Thank you!
Intermediate & Advanced SEO | | FirstService0 -
Drop in Indexed pages
Hope everyone is having an Awesome December! I first noticed a drop in my index in the beginnings of November. My site drop in indexed pages from 1400 to 600 in the past 3-4 weeks. I don't know the cause of it, and would like the community to help me figure out why my indexing has dropped. Thank you for taking time out of your schedule to read this.
Intermediate & Advanced SEO | | BSC0 -
Using two 404 NOT FOUND pages
Hi all, I was wondering if any of you can advise whether it's no issue to use two separate custom 404 pages. The 404 pages would be different for different parts of the site. For instance, if you're on /community/ and you enter a non-existing page on: www.sample.com/community/example/ it would give you a different 404 page than someone who runs into a non existing page at: www.sample.com/definition/example/ Does anybody have experience with this and would this be fine?
Intermediate & Advanced SEO | | RonFav0 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
How do you de-index and prevent indexation of a whole domain?
I have parts of an online portal displaying in SERPs which it definitely shouldn't be. It's due to thoughtless developers but I need to have the whole portal's domain de-indexed and prevented from future indexing. I'm not too tech savvy but how is this achieved? No index? Robots? thanks
Intermediate & Advanced SEO | | Martin_S0 -
redirect 404 pages to homepage
Hello, I'm puting a new website on a existing domain. In order to not loose the links that point to the varios old url I would like to redirect them to homepage. The old website was a mess as there was no seo and the pages didn't target any keywords. Thats why I would like to redirect all links to home. What do you think is the best way to do this ? I tried to ad this in the .htaccess but it's not working; ErrorDocument 404 /index.php Con you tell me how it exacly look? Now the hole file is like this: @package Joomla @copyright Copyright (C) 2005 - 2012 Open Source Matters. All rights reserved. @license GNU General Public License version 2 or later; see LICENSE.txt READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE! The line just below this section: 'Options +FollowSymLinks' may cause problems with some server configurations. It is required for use of mod_rewrite, but may already be set by your server administrator in a way that dissallows changing it in your .htaccess file. If using it causes your server to error out, comment it out (add # to beginning of line), reload your site in your browser and test your sef url's. If they work, it has been set by your server administrator and you do not need it set here. Can be commented out if causes errors, see notes above. Options +FollowSymLinks Mod_rewrite in use. RewriteEngine On Begin - Rewrite rules to block out some common exploits. If you experience problems on your site block out the operations listed below This attempts to block the most common type of exploit attempts to Joomla! Block out any script trying to base64_encode data within the URL. RewriteCond %{QUERY_STRING} base64_encode[^(]([^)]) [OR] Block out any script that includes a
Intermediate & Advanced SEO | | igrizo0