404's being re-indexed
-
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
-
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
First it would be helpful to know how you are detecting that it isn't working. What indexation tool are you using to see whether the blocks are being detected? I personally really like this one: https://chrome.google.com/webstore/detail/seo-indexability-check/olojclckfadnlhnlmlekdihebmjpjnoa?hl=en-GB
Or obviously at scale - Screaming Frog
-
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
Well if a page has been removed and has not been moved to a new destination - you shouldn't redirect a user anyway (which kind of 'tricks' users into thinking the content was found). That's actually bad UX
If the content has been properly removed or was never supposed to be there, just leave it at a 410 (but maybe create a nice custom 410 page, in the same vein as a decent UX custom 404 page). Use the page to admit that the content is gone (without shady redirects) but to point to related posts or products. Let the user decide, but still be useful
If the content is actually still there and, hence you are doing a redirect - then you shouldn't be serving 404s or 410s in the first place. You should be serving 301s, and just doing HTTP redirects to the content's new (or revised) destination URL
Yes, the HTTP header method is the correct replacement when the HTML implementation gets stripped out. HTTP Header X-Robots is the way for you!
-
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
You know that 404 means "temporarily gone but will be coming back" right? By saying a page is temporarily unavailable, you actively encourage Google to come back later
If you want to say that the page is permanently gone use status code 410 (gone)
Leave the Meta no-index stuff in the HTTP header via X-Robots, that was a good call. But it was a bad call to combine Meta no-index and 404, as they contradict each other ("don't index me now but then do come back and index me later as I'll probably be back at some point")
Use Meta no-index and 410, which agree with each other ("don't index me now and don't bother coming back")
-
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
Back to basics. Have you marked those pages/posts as 'no-index'. With many wp plugins, you can no-index them in bulk then submit for re-indexation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!
Hi Everyone, I a currently migrating a Magento site over to Shopify Plus and have a question about best practices for using the canonical URL. There is a competitor that I believe is not doing it the correct way, so I want to make sure my way is the better choice. With 'Vendor Pages' in Shopify, they show up looking like: https://www.campusprotein.com/collections/vendors?q=Cellucor. Not as clean. Problem is that Shopify also creates https://www.campusprotein.com/collections/cellucor. Same products, same page, just a different more clean URL. I am seeing both indexed in Google. What I want to do is basically create a canonical URL from the URL with the parameter that points to the clean URL. The two pages are very similar. The only difference is that the clean URL page has some additional content at the top of the page. I would say the two pages are 90% the same. Do you see any issue with that?
Technical SEO | | vetofunk0 -
Weird problems with google's rich snippet markup
Once upon a time, our site was ranking well and had all the markups showing up in the results. We than lost some of our rankings due to dropped links and not so well kept maintenance. Now, we are gaining up the rankings again, but the markups don't show up in the organic search results. When we Google site:oursite.com, the markups show up, but not in the organic search. There are no manual actions against our site. any idea why this would happen?
Technical SEO | | s-s0 -
Javascript to manipulate Google's bounce rate and time on site?
I was referred to this "awesome" solution to high bounce rates. It is suppose to "fix" bounce rates and lower them through this simple script. When the bounce rate goes way down then rankings dramatically increase (interesting study but not my question). I don't know javascript but simply adding a script to the footer and watch everything fall into place seems a bit iffy to me. Can someone with experience in JS help me by explaining what this script does? I think it manipulates the reporting it does to GA but I'm not sure. It was supposed to be placed in the footer of the page and then sit back and watch the dollars fly in. 🙂
Technical SEO | | BenRWoodard1 -
What's the issue?
Hi, We have a client who dropped in the rankings (initially from bottom of the first page to page to page 3, and now page 5) for a single keyword (their most important one - targeted on their homepage) back in the middle of March. So far, we've found that the issue isn't the following: Keyword stuffing on the page External anchor text pointing to the page Internal anchor text pointing to the page In addition to the above, the drop didn't coincide with panda or penguin. Any other ideas as to what could cause such a drop for a single keyword (other related rankings haven't moved). We're starting to think that this may just have been another small change in the algorithm but it seems like too big of a drop in a short space of time for that to be the case. Any thoughts would be much appreciated! Thanks.
Technical SEO | | jasarrow0 -
Can someone break down 'page level link metrics' for me?
Sorry for the, again, basic question - can someone define page level link metrics for me?
Technical SEO | | Benj250 -
It's imposible to keep track of rankings?
Hello, here something interesting I'm Using Rank Tracker from SEOMOZ And from the link-assistant's Rank Tracker as Well... I need to track Google.com and Google.co.ve (venezuela) so I did... i got my keyword an here are my results. 1 Keyword A at google.com (united states) Rank Tracker SEOMOZ = pos 6 Rank Tracker OTHER = pos 6 Manual Query on google.com = 9 (I used the exact url seomoz tells me its using) 2 Keyword A at Google.co.ve Rank Tracker SEOMOZ = pos 8 Rank Tracker OTHER = pos 7 Manual query on google.co.ve = pos 8 So.... Why it's that?, so far I think that google.com for me down here (it actually says "Español") it's a different index? for latinamerica? only spanish pages? maybe it's because there's a couple of minutes between looking with one tool and the other... any help, would be great... Dan
Technical SEO | | daniel.alvarez0 -
Crawl Tool Producing Random URL's
For some reason SEOmoz's crawl tool is returning duplicate content URL's that don't exist on my website. It is returning pages like "mydomain.com/pages/pages/pages/pages/pages/pricing" Nothing like that exists as a URL on my website. Has anyone experienced something similar to this, know what's causing it, or know how I can fix it?
Technical SEO | | MyNet0 -
Why do I have one page showing as two url's?
My SEOMoz stats show that I have duplicate titles for the following two url's: http://www.rmtracking.com/products.php and http://www.rmtracking.com/products I have checked my server files, and I don't see a live page without the php. A while back, we converted our site from html to php, but the html pages have 301's and as you can see the page without the php is properly redirecting to the php page. Any ideas why this would show as two separate url's?
Technical SEO | | BradBorst0