Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
-
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results.
Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/
Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page.
I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed.
Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
-
@effectdigital and @jasongmcmahon did you ever get to the bottom of this and if so what caused it and what was the long term fix, as GSC and Google seem to behaving in a peculiar way?
We had a similar issue with this page: https://www.simplyadverse.co.uk/bad-credit-mortgage, but after several cache clears and re-indexing/fix requests it indexed fine.
We now have a page on another similar site that is stubbornly refusing to index. Its a new site and other than the a simple domain homepage, all pages when under development had "noindex " on them.
Several pages on the site on launch behaved like this with GSC saying the page was marked as "noindex" but submitted in the sitemap, but when you check to see if indexing was possible GSC says its fine (we'd removed noindex and setup the sitemap) . All crawling tools say its fine, but this page wont index despite repeated attempts over a couple of weeks, all other pages are now fine, but this page won't index: https://simplysl.co.uk/buy-to-let/
Other than they're all mortgage related sites/pages, I can't fathom why one page would be troublesome and all others index OK despite having the same setup and indexing process, any ideas?
-
Thanks, I'll take a look
-
Thanks for going into so much detail, much appreciated.
We've asked Google to reindex it and 'validate the fix', even though we can't find anything to fix!
-
Hi there, check that caching isn; the issues at server & CMS levels. Other than that reindex the page via GSC
-
This is really weird. Really really weird!
As you say, your site's source code seems to confirm that it is set to index. If we look here, we can plainly see that the coding syntax for a no-index directive is "noindex" (all one word).
Let's look at your source code:
Yep, everything seems fine there! But what if a script is modifying your source code and including the directive - and Google's picking up on that?
If we look at the modified source code which I rendered and saved to a file here:
... we can see, there are no problems here either:
Wow - that's really unhelpful!
Let's see what happens if we specifically search Google's live index for the URL:
Interestingly, when we search Google's index for this page, we get this page returned instead.
It makes sense that Google would return that URL if it couldn't return the main URL, as one is nested inside of the other. If everything was healthy, we'd see Google listing both URLs instead of just one of them. Even if you edit my index query to remove the trailing slash, you still only get the nested URL (not the one you want to be showing, which is at a slightly higher-up level)
Another thought I had was, hmm maybe this is a canonical tag gone rogue. That bore no fruit either, as this page (which you want to index, yet won't) canonicals to this page - and both of those URLs are exactly the same. As such, it's obvious that we can't blame the canonical tag either! I even viewed the modified source to see if it got altered, no dice (the canonical tag is just fine)
Maybe the XML file is telling Google not to index the URL?
Nope - that's fine too! No problems there...
Could the robots.txt file be interfering?
No! Darn it, that's not the problem
I know that a no-index or blocking directive can also be sent through the HTTP header (usually via X-robots). Let's check the response header of your URL out:
Nothing there that really raises my eyebrow. This is enabled and set to block, but to be honest that shouldn't affect Google's crawling at all. Anyone correct me if I am wrong, but defending your site against cross-site scripting (XSS) attacks doesn't impede crawling right?
Fudge it. Let's fling it through Google's Page-Speed Insights tool. Usually that will tell you if something is being blocked and why...
Nothing useful still!
Google's mobile friendly tool gives us some, semi-interesting information:
But it doesn't say the page can't be loaded. It only says some resources which the page pulls in can't be loaded! And guess what? They're all external things on other websites (other than a few theme related bits, but nothing IMO that should stop the whole page loading).
Let's try DeepCrawl's indexability checker (they make amazing software by the way... expensive though):
Sir... there is NO GOOD REASON why your URL shouldn't be indexed. I am 99.9% certain you have encountered a legit Google bug. Post about it here. Only Google can help you at this juncture
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is still indexing the old domain a year after 301 redirects are put in place
Hi there, You might have experienced this before but for me this is the first. A client of mine moved from domain A (www.domainA.com) to domain B (www.domainB.com). 301 redirects are all in place for over a year. But the old domain is still showing in Google when you search for "site:domainA.com" The HTTP Header check shows this result for the URL https://www.domainA.com/company/cookie-policy.aspx HTTP/1.1 301 Moved Permanently =>
Technical SEO | | iQi
Cache-Control => private
Content-Length => 174
Content-Type => text/html; charset=utf-8
Location => https://www.domain_B_.com/legal/cookie-policy
Server => Microsoft-IIS/10.0
X-AspNetMvc-Version => 5.2
X-AspNet-Version => 4.0.30319
X-Powered-By => ASP.NET
Date => Fri, 15 Mar 2019 12:01:33 GMT
Connection => close Does the redirect look wrong? The change of address request was made on Google Console when the website was moved over a year ago. Edit: Checked the domainA.com on bing and it seems that its not indexed, and replaced with domainB.com, which is the right. Just Google is indexing the old domain! Please let me know your thoughts on why this is happening. Best,0 -
Google not Indexing images on CDN.
My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks
Technical SEO | | TNZ
Deepinder0 -
Sitemap indexed pages dropping
About a month ago I noticed my pages indexed from my sitemap are dropping.There are 134 pages in my sitemap and only 11 are indexed. It used to be 117 pages and just died off quickly. I still seem to be getting consistant search traffic but I'm just not sure whats causing this. There are no warnings or manual actions required in GWT that I can find.
Technical SEO | | zenstorageunits0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
I am doing a bit of SEO work for a friend, and the situation is the following: The site is a place to discuss articles on the web. When clicking on a link that has been posted, it sends the user to a URL on the main site that is URL.com/article/view. This page has a large iframe that contains the article itself, and a small bar at the top containing the article with various links to get back to the original site. I'd like to make sure that the comment pages (URL.com/article) are indexed instead of all of the URL.com/article/view pages, which won't really do much for SEO. However, all of these pages are indexed. What would be the best approach to make sure the iframe pages aren't indexed? My intuition is to just have a "noindex" in the header of those pages, and just make sure that the conversation pages themselves are properly linked throughout the site, so that they get indexed properly. Does this seem right? Thanks for the help...
Technical SEO | | jim_shook0 -
De-indexed from Google
Hi Search Experts! We are just launching a new site for a client with a completely new URL. The client can not provide any access details for their existing site. Any ideas how can we get the existing site de-indexed from Google? Thanks guys!
Technical SEO | | rikmon0 -
Why google index my IP URL
hi guys, a question please. if site:112.65.247.14 , you can see google index our website IP address, this could duplicate with our darwinmarketing.com content pages. i am not quite sure why google index my IP pages while index domain pages, i understand this could because of backlink, internal link and etc, but i don't see obvious issues there, also i have submit request to google team to remove ip address index, but seems no luck. Please do you have any other suggestion on this? i was trying to do change of address setting in Google Webmaster Tools, but didn't allow as it said "Restricted to root level domains only", any ideas? Thank you! boson
Technical SEO | | DarwinChinaSEO0