Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Meta NoIndex tag and Robots Disallow
- 
					
					
					
					
 Hi all, I hope you can spend some time to answer my first of a few questions  We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with 
 "There is no information about this page because it is blocked by robots.txt"So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. - 
Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? 
- 
Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? 
 I thought Robots would stop and prevent indexation? But I've read this: 
 "Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B 
- 
- 
					
					
					
					
 There's no real way to estimate how long the re-crawl will take, Ben. You can get a bit of an idea by looking at the crawl rate reported in Google Webmaster Tools. Yes, asking for a page fetch then submitting with linked pages for each of the main website sections can help speed up the crawl discovery. In addition, make sure you've submitted a current sitemap and it's getting found correctly (also reported in GWT) You should also do the same in Bing Webmaster Tools. Too many sites forget about optimizing for Bing - even if it's only 20% of Google's traffic, there's no point throwing it away. Lastly, earning some new links to different sections of the site is another great signal. This can often be effectively & quickly done using social media - especially Google+ as it gets crawled very quickly. As far as your other question - yes, once you get the unwanted URLs out of the index, you can add the robots.txt disallow back in to optimise your crawl budget. I would strongly recommend you leave the meta-robots no-index tag in place though as a "belt & suspenders" approach to keep pages linking into those unwanted pages from triggering a re-indexing. It's OK to have both in place as long as the de-indexing has already been accomplished, as we've discussed. Hope that answer your questions? Paul 
- 
					
					
					
					
 So once Google has started to see the meta-noindex and is slowly deindexing pages, once that is done, I would like to block it from crawling them with a robots.txt to conserve my crawl budget. But, there are still internal links on the site that point to these URL´s - would they get back into the index in this case? 
- 
					
					
					
					
 Hi Paul, Thank you for your detailed answer - so I'm not going crazy  I did try with canonicals but then realized they are more of a suggestion as opposed to a directive and I am still correcting a lot of dupe content and 404's so I am imagining that Google view's the site as "these guys don't know what they are doing' so may have ignored the canonical suggestion. So what I have done is remove the robots block on the pages I want de-indexed and add in meta noindex, follow on these pages - From what you are saying, they should naturally de-index, after which, I will put the robots.txt block back on to keep my crawl budget spent on better areas of the site. How long in your opinion can it take for Googlebot to de-index the pages? Can I help it along at all to speed up? Fetch page and linking pages as Googlebot? Thanks again, Ben 
- 
					
					
					
					
 You're right to be confused, B. The terminology is unfortunate and misleading. To answer your questions 1. Yes 2. Yes. A disallow in robots.txt does nothing to remove already-indexed pages. That's not its purpose. Its only purpose is to tell the search crawlers not to waste their time crawling those pages. Even if pages have been blocked in robots, they will remain in the index if already there. Even if never crawled, and blocked in robots.txt, they can still end up indexed if some other indexed page links to them and the crawlers find those pages by following links. Again, nothing in a robots.txt disallow tells the engines to remove a page from the index, just not to waste time crawling it. Put another way, the robots.txt disallow directive only disallows crawling - it says nothing about what to do if the page gets into the index in other ways. The meta-robots no-index tag however explicitly states to the crawler "if you arrive at this page, do not add it to the index. If it is already in the index, remove it". And yea - as you suspected - if pages are blocked in robots.txt, the crawler obeys and doesn't visit those pages So it can't discover the no-index command to drop them from the index. Thus the only way a page could get dropped is if a crawler followed a link from an external site and discovered the page that way. A very inefficient way of trying to get all those pages out of the index. Bottom line - robots.txt is never the correct tool to deal with duplicate content issues. It's sole purpose is to keep the crawlers from wasting time on unimportant pages so they can spend more time finding (and therefore indexing) more important pages. The three tools for dealing with duplicate content are meta-robots no-index tags in a page header, 301 redirects, and canonical tags. Which one to use depends on the architecture of your site, your intended purpose, and the site's technical limitations. Hope that makes sense? Paul 
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Block session id URLs with robots.txt
 Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: * Intermediate & Advanced SEO | | Mat_C
 Disallow: ?filter= or User-agent: *
 Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1
- 
		
		
		
		
		
		Can noindexed pages accrue page authority?
 My company's site has a large set of pages (tens of thousands)Â that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed? Intermediate & Advanced SEO | | THandorf0
- 
		
		
		
		
		
		H2 Tags- Can you have more than 1 H2 tag
 Hi All, Screaming frog has identified that we have a few H2 tags on our pages , although we only have 1 H1 tag. We have numerous H3,H4's etc. I am wondering, is it good SEO to have only 1 H2 tag like with H1 tag or can you have more ? thanks Peter Intermediate & Advanced SEO | | PeteC120
- 
		
		
		
		
		
		Is it okay to copy and paste on page content into the meta description tag?
 I have heard conflicting answers to this. Â I always figured that it was okay to selectively copy and paste on page content into the meta description tag.....especially if the onpage content is well written. Â Â How can it be duplicate content if it's pulling from the exact same page? Does anybody have any feedback from a credible source about this? Thanks. Intermediate & Advanced SEO | | VanguardCommunications1
- 
		
		
		
		
		
		Noindex xml RSS feed
 Hey, How can I tell search engines not to index my xml RSS feed? The RSS feed is created by Yoast on WordPress. Thanks, Luke. Intermediate & Advanced SEO | | NoisyLittleMonkey0
- 
		
		
		
		
		
		Noindex a meta refresh site
 I have a client's site that is a vanity URL, i.e. www.example.com, that is setup as a meta refresh to the client's flagship site: www22.example.com, however we have been seeing Google include the Vanity URL in the index, in some cases ahead of the flagship site. What we'd like to do is to de-index that vanity URL. We have included a no-index meta tag to the vanity URL, however we noticed within 24 hours, actually less, the flagship site also went away as well. When we removed the noindex, both vanity and flagship sites came back. We noticed in Google Webmaster that the flagship site's robots.txt file was corrupt and was also in need of fixing, and we are in process of fixing that - Question: Is there a way to noindex vanity URL and NOT flagship site? Was it due to meta refresh redirect that the noindex moved out the flagship as well? Was it maybe due to my conducting a google fetch and then submitting the flagship home page that the site reappeared? The robots.txt is still not corrected, so we don't believe that's tied in here. To add to the additional complexity, the client is UNABLE to employ a 301 redirect, which was what I recommended initially. Anyone have any thoughts at all, MUCH appreciated! Intermediate & Advanced SEO | | ACNINTERACTIVE0
- 
		
		
		
		
		
		Meta Tag Force Page Refresh - Good or Bad?
 I had recently come across a meta tag that could cause a auto refresh on a users browser when implemented. Â I have been using it for a redesign and was curious if there could be any negative effects for using it, here is the code: All input is appreciated. Ciao, Todd Richard Intermediate & Advanced SEO | | RichFinnSEO0
- 
		
		
		
		
		
		Why should your title and H1 tag be different?
 Is it dangerous to have your H1 tag and your title the exact same thing? My thought was that it's not be the best use of space, but that it couldn't cause harm. What do you think? Intermediate & Advanced SEO | | MarieHaynes7
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				