Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & meta noindex--site still shows up on Google Search
-
I have set up my robots.txt like this:
User-agent: *
Disallow: /and I have this meta tag in my on a Wordpress site, set up with SEO Yoast
name="robots" content="noindex,follow"/>
I did "Fetch as Google" on my Google Search Console
My website is still showing up in the search results and it says this:
"A description for this result is not available because of this site's robots.txt"
This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.
-
CleverPhd,
Really since to see a detailed yet to the point answer.
Thanks for contributing, and being in the Moz community.
Regards,
Vijay
-
Thanks for that clarification CleverPhD, forgot to mention that.
-
This one has my vote. You have to allow them access in order to see that you don't want the pages indexed. If you block them from seeing this rule...well they won't be able to see it.
-
Just to be clear on what Logan said. You have to allow Google to crawl your site by opening up your robots.txt to Google so it can see your noindex directive that is on each of the pages. Otherwise Google will never "see" the noindex directive on your pages.
Likewise, on sitemap.xml. If you are not allowing Google to crawl the sitemap (because you are blocking it with robots.txt) then Google will not read the sitemap, find all your pages that have the noindex directive on them and then remove those pages from the index.
A great article is here
https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
From the mouth of Google "Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it."
The other point that logan makes is that Google might list your site if there are enough sites linking to it. The steps above should take care of this, as you are deindexing the page, but here is what I am thinking he is referencing
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google will include a site that is blocked in robots.txt if enough pages link to it, even if they have not crawled the url.
You can go into Search Console and find all the links that they say are pointing to your site. You can also use tools like CognitiveSEO or Ahrefs, Majestic or Moz etc and gather up all of those sites to find links to your site and include those in a disavow file that you put into Search Console and tell Google to ignore all of those links to your site.
Secret bonus method. Putting a noindex directive in your robots
https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
This allows you to manage your noindex directives in your robots.txt. Makes it easier as you can control all your noindex directives from a central location and block whole folders at a time. This would stop Google from crawling AND indexing pages all in one page and you can just leave the rest of the site alone and not worry about if a noindex tag should or should not be on a certain page.
Good luck!
-
As mentioned by Logan,noindex meta tag
is the most effective way to remove indexed pages. It sometimes takes time, you have to submit the right sitemap.xml which cover the pages/post you wish to get removed from google index.
-
I did read that about the robots.txt and that is why I added the noindex.
I use SEO Yoast for sitemap.xml, so shouldn't all my pages be there? I believe they are because I just looked at it a couple days ago.
So are you saying I should look through my backlink profile (WMT) and try to remove any backlinks?
Would 'Fetch as Google' not ping Google to tell them to recrawl?
Thanks for your help.
-
Hi,
First things first, it's a common misconception that the robots.txt disallow: / will prevent indexing. It's only indented to prevent crawling, which is why you don't get a meta description pulled into the result snippet. If you have links pointing to that page and a disallow: / on your robots, it's still eligible for indexation.
Second, it's pretty weird that the noindex tag isn't effective, as that's the only sure-fire way to get de-indexed intentionally. I would recommend creating an XML sitemap for all URLs on that domain that are noindex'd and resubmit that in Search Console. If Google hasn't crawled your site since adding the noindex, they don't know it's there. In my experience, forcing them to recrawl via XML submission has been effective at getting noindex noticed quicker.
I would also recommend taking a look at the link profile and removing any possible links pointing to your noindex pages, this will help future attempts at indexing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Hack In Meta Description
Hey MOZ Community, I am looking for some help in identifying where the following meta description is coming from on this home page - https://www.apins.com. I have scrubbed through the page source without being able to locate where the content is being pulled from. The website is built on WordPress and metas were updated using Yoast, but I am wondering if an installed plugin could be the culprit. On top of this, I have had a developer take a look for the "hack" and they have assured that the issue has been removed. I have submitted the URL in GSC a couple of times to be re-indexed but have not had much luck. Any thoughts would be much appreciated, the displayed description is below. The health screening plays http://buyviagraonlineccm.com/ a significant and key role in detecting potentially life-threatening illnesses such as cancer, heart ...
Technical SEO | | jordankremer0 -
Guys & Gals anyone know if urllist.txt is still used?
I'm using a tool which generates urllist.txt and looking on the SEO Forums it seems that Yahoo used to use this. What I'd like to know is is it still used anywhere and should we have it on the site?
Technical SEO | | danwebman0 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Tags showing up in Google
Yesterday a user pointed out to me that Tags were being indexed in Google search results and that was not a good idea. I went into my Yoast settings and checked the "nofollow, index" in my Taxanomies, but when checking the source code for no follow, I found nothing. So instead, I went into the robot.txt and disallowed /tag/ Is that ok? or is that a bad idea? The site is The Tech Block for anyone interested in looking.
Technical SEO | | ttb0 -
When is the last time Google crawled my site
How do I tell the last time Google crawled my site. I found out it is not the "Cache" which I had thought it was.
Technical SEO | | digitalops0 -
Why are old versions of images still showing for my site in Google Image Search?
I have a number of images on my website with a watermark. We changed the watermark (on all of our images) in May, but when I search for my site getmecooking in Google Image Search, it still shows the old watermark (the old one is grey, the new one is orange). Is Google not updating the images its search results because they are cached in Google? Or because it is ignoring my images, having downloaded them once? Should we be giving our images a version number (at the end of the file name)? Our website cache is set to 7 days, so that's not the issue. Thanks.
Technical SEO | | Techboy0 -
Google.ca is showing our US site instead of our Canada Site
When our Canadian users who search on google.ca for our brand (e.g. Travelocity, Travelocity hotels, etc.), the first few results our from our US site (travelocity.com) rather than our Canadian site (travelocity.ca). In Google Webmaster Tools, we've adjusted the geotargeting settings to focus on the appropriate locale, but the wrong country TLD is still coming up at the top via google.ca. What's the best way to ensure our Canadian site comes up instead of the US site on google.ca? Thanks, Tory Smith
Technical SEO | | travelocitysearch
Travelocity0