404's being re-indexed
-
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
-
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
First it would be helpful to know how you are detecting that it isn't working. What indexation tool are you using to see whether the blocks are being detected? I personally really like this one: https://chrome.google.com/webstore/detail/seo-indexability-check/olojclckfadnlhnlmlekdihebmjpjnoa?hl=en-GB
Or obviously at scale - Screaming Frog
-
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
Well if a page has been removed and has not been moved to a new destination - you shouldn't redirect a user anyway (which kind of 'tricks' users into thinking the content was found). That's actually bad UX
If the content has been properly removed or was never supposed to be there, just leave it at a 410 (but maybe create a nice custom 410 page, in the same vein as a decent UX custom 404 page). Use the page to admit that the content is gone (without shady redirects) but to point to related posts or products. Let the user decide, but still be useful
If the content is actually still there and, hence you are doing a redirect - then you shouldn't be serving 404s or 410s in the first place. You should be serving 301s, and just doing HTTP redirects to the content's new (or revised) destination URL
Yes, the HTTP header method is the correct replacement when the HTML implementation gets stripped out. HTTP Header X-Robots is the way for you!
-
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
You know that 404 means "temporarily gone but will be coming back" right? By saying a page is temporarily unavailable, you actively encourage Google to come back later
If you want to say that the page is permanently gone use status code 410 (gone)
Leave the Meta no-index stuff in the HTTP header via X-Robots, that was a good call. But it was a bad call to combine Meta no-index and 404, as they contradict each other ("don't index me now but then do come back and index me later as I'll probably be back at some point")
Use Meta no-index and 410, which agree with each other ("don't index me now and don't bother coming back")
-
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
Back to basics. Have you marked those pages/posts as 'no-index'. With many wp plugins, you can no-index them in bulk then submit for re-indexation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
NoIndex tag, canonical tag or automatically generated H1's for automatically generated enquiry pages?
What would be better for automatically generated accommodation enquiry pages for a travel company? NoIndex tag, canonical tag, automatically generated H1's or another solution? This is the homepage: https://www.discoverqueensland.com.au/ You would enquire from a page like this: https://www.discoverqueensland.com.au/accommodation/sunshine-coast/twin-waters/the-sebel-twin-waters This is the enquiry form: https://www.discoverqueensland.com.au/accommodation-enquiry.php?name=The+Sebel+Twin+Waters®ion_name=Sunshine+Coast
Technical SEO | | Kim_Lazaro0 -
My Website's Home Page is Missing on Google SERP
Hi All, I have a WordPress website which has about 10-12 pages in total. When I search for the brand name on Google Search, the home page URL isn't appearing on the result pages while the rest of the pages are appearing. There're no issues with the canonicalization or meta titles/descriptions as such. What could possibly the reason behind this aberration? Looking forward to your advice! Cheers
Technical SEO | | ugorayan0 -
Using "Div's" to place content at top of HTML
Is it still a good practice to use "div's" to place content at the top of the HTML code, if your content is at the bottom of the web page?
Technical SEO | | tdawson090 -
What was the Google 'update' on 31st March?
Hi all. I looked back and saw that there was an update shown in 'Search Analytics' in Webmaster Tools a few weeks before the Mobile algorithm update. Not been able to find any mention of it and what it did so thought I'd check in here. ps. Also, this is a 90 day stretch and shows that our rankings have taken a hit since the mobile algorithm update. Interesting stuff (see image below) 4rJMU9e.jpg?1
Technical SEO | | RobFD0 -
Test site got indexed in Google - What's the best way of getting the pages removed from the SERP's?
Hi Mozzers, I'd like your feedback on the following: the test/development domain where our sitebuilder works on got indexed, despite all warnings and advice. The content on these pages is in active use by our new site. Thus to prevent duplicate content penalties we have put a noindex in our robots.txt. However off course the pages are currently visible in the SERP's. What's the best way of dealing with this? I did not find related questions although I think this is a mistake that is often made. Perhaps the answer will also be relevant for others beside me. Thank you in advance, greetings, Folko
Technical SEO | | Yarden_Uitvaartorganisatie0 -
What's Worse - 404 errors or a huge .htaccess file
We have changed our site architecture pretty significantly and now have many fewer pages (albeit with more robust content and focused linking). My question is, what should I do about all the 404 errors (keep in mind, I am only finding these in Bing Webmaster tools, not Moz or GWT)? Is it worse to have all those 404 errors (hundreds), or to have a massive htaccess file for pages that are only getting hits by the Bing crawlbot. Any insight would be great. Thanks
Technical SEO | | CleanEdisonInc0 -
We can't figure out why competitors have better position(s) in Google
We are using MOZ analytics for some days now, and it really helps us with important information about our rankings.
Technical SEO | | wilcoXXL
I hope you guys can help us out with the following particular case; In google.nl (dutch) we rank position #18 with the following searchterm 'sphinx 345' one of our competitors rank position #3.
We used the MOZ On Page Grade tool to find out some details about the two pages:
Our page #18: http://goo.gl/cTsbmI
Competitor page #3: http://goo.gl/qk21sM Our page hits an A and Keyword usage for "sphinx 345" = 52
The competitors page hits an A too and Keyword usage for "sphinx 345" = 45 About the link structure; for our page there is no link data found in Open Site Explorer. The url exists about a year and a half now.
I'm also very sure we have many internal links to this url.
Does Google and other crawlers have a hard time to crawl our site?(it's a Magento site, our competitors do have custom-made e-commerce systems, maybe that has something to do with it?) As i were saying;we can't figure this out. I hope you guys can help to get us any further. Regards, Wilco0 -
Different IP's in one Server
Hi, I just want to ask if there is no bad effect in SEO if we do have different websites that has different IP address but has shared in only 1 server? Thank you
Technical SEO | | TirewebMarketing0