Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & meta noindex--site still shows up on Google Search
-
I have set up my robots.txt like this:
User-agent: *
Disallow: /and I have this meta tag in my on a Wordpress site, set up with SEO Yoast
name="robots" content="noindex,follow"/>
I did "Fetch as Google" on my Google Search Console
My website is still showing up in the search results and it says this:
"A description for this result is not available because of this site's robots.txt"
This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.
-
CleverPhd,
Really since to see a detailed yet to the point answer.
Thanks for contributing, and being in the Moz community.
Regards,
Vijay
-
Thanks for that clarification CleverPhD, forgot to mention that.
-
This one has my vote. You have to allow them access in order to see that you don't want the pages indexed. If you block them from seeing this rule...well they won't be able to see it.
-
Just to be clear on what Logan said. You have to allow Google to crawl your site by opening up your robots.txt to Google so it can see your noindex directive that is on each of the pages. Otherwise Google will never "see" the noindex directive on your pages.
Likewise, on sitemap.xml. If you are not allowing Google to crawl the sitemap (because you are blocking it with robots.txt) then Google will not read the sitemap, find all your pages that have the noindex directive on them and then remove those pages from the index.
A great article is here
https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
From the mouth of Google "Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it."
The other point that logan makes is that Google might list your site if there are enough sites linking to it. The steps above should take care of this, as you are deindexing the page, but here is what I am thinking he is referencing
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google will include a site that is blocked in robots.txt if enough pages link to it, even if they have not crawled the url.
You can go into Search Console and find all the links that they say are pointing to your site. You can also use tools like CognitiveSEO or Ahrefs, Majestic or Moz etc and gather up all of those sites to find links to your site and include those in a disavow file that you put into Search Console and tell Google to ignore all of those links to your site.
Secret bonus method. Putting a noindex directive in your robots
https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
This allows you to manage your noindex directives in your robots.txt. Makes it easier as you can control all your noindex directives from a central location and block whole folders at a time. This would stop Google from crawling AND indexing pages all in one page and you can just leave the rest of the site alone and not worry about if a noindex tag should or should not be on a certain page.
Good luck!
-
As mentioned by Logan,noindex meta tag
is the most effective way to remove indexed pages. It sometimes takes time, you have to submit the right sitemap.xml which cover the pages/post you wish to get removed from google index.
-
I did read that about the robots.txt and that is why I added the noindex.
I use SEO Yoast for sitemap.xml, so shouldn't all my pages be there? I believe they are because I just looked at it a couple days ago.
So are you saying I should look through my backlink profile (WMT) and try to remove any backlinks?
Would 'Fetch as Google' not ping Google to tell them to recrawl?
Thanks for your help.
-
Hi,
First things first, it's a common misconception that the robots.txt disallow: / will prevent indexing. It's only indented to prevent crawling, which is why you don't get a meta description pulled into the result snippet. If you have links pointing to that page and a disallow: / on your robots, it's still eligible for indexation.
Second, it's pretty weird that the noindex tag isn't effective, as that's the only sure-fire way to get de-indexed intentionally. I would recommend creating an XML sitemap for all URLs on that domain that are noindex'd and resubmit that in Search Console. If Google hasn't crawled your site since adding the noindex, they don't know it's there. In my experience, forcing them to recrawl via XML submission has been effective at getting noindex noticed quicker.
I would also recommend taking a look at the link profile and removing any possible links pointing to your noindex pages, this will help future attempts at indexing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Google Search Console Still Reporting Errors After Fixes
Hello, I'm working on a website that was too bloated with content. We deleted many pages and set up redirects to newer pages. We also resolved an unreasonable amount of 400 errors on the site. I also removed several ancient sitemaps that listed content deleted years ago that Google was crawling. According to Moz and Screaming Frog, these errors have been resolved. We've submitted the fixes for validation in GSC, but the validation repeatedly fails. What could be going on here? How can we resolve these error in GSC.
Technical SEO | | tif-swedensky0 -
Page disappears from Google search results
Hi, I recently encountered a very strange problem.
Technical SEO | | JoelssonMedia
One of the pages I published in my website ranked very well for a couple of days on top 5, then after a couple of days, the page completely vanished, no matter how direct I search for it, does not appear on the results, I check GSC, everything seems to be normal, but when checking Google analytics, I find it strange that there is no data on the page since it disappeared and it also does not show up on the 'active pages' section no matter how many different computers i keep it open. I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Has this ´happened to anyone before? Any thoughts would be gratefully received.0 -
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Guys & Gals anyone know if urllist.txt is still used?
I'm using a tool which generates urllist.txt and looking on the SEO Forums it seems that Yahoo used to use this. What I'd like to know is is it still used anywhere and should we have it on the site?
Technical SEO | | danwebman0 -
Title tag not showing on google? Please Help!
I've read the FAQs and searched the help center. My URL is: http://www.webygeeks.comI have updated title tags of my client's website 10-15 days ago, still the title on google is coming as the company name 😞 Why so??Description is correct but title is incorrect, can you please recommend me something guys?Also, i am wondering why the google cache is showing date of september 5 and we have changed the titles around 10 - 15 days before that http://webcache.googleusercontent.com/search?q=cache:P45GOiHRaIUJ:www.webygeeks.com/+&cd=1&hl=en&ct=clnk Really appreciate your suggestion.
Technical SEO | | lvp11380 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Google Showing Multiple Listings For Same Site?
I've been optimizing a small static HTML site and have been working to increase the keyword rankings, yet have always ranked #1 for the company name. But, I've now noticed the company name is taking more than just the first position - the site is now appearing in 1st, 2nd, and 3rd position (each position referencing a different page of the site). Great.. who doesn't want to dominate a page of Google! ..But it looks kind of untidy and not usually how links from the same site are displayed. Is this normal? I'm used to seeing results from the same site grouped under the primary result, but not like this. any info appreciated 🙂
Technical SEO | | GregDixson0