Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Meta NoIndex tag and Robots Disallow
-
Hi all,
I hope you can spend some time to answer my first of a few questions

We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS!
Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination).
After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
"There is no information about this page because it is blocked by robots.txt"So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt.
So coming to my question.
-
Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index?
-
Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index?
I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index.
Thanks!
B
-
-
There's no real way to estimate how long the re-crawl will take, Ben. You can get a bit of an idea by looking at the crawl rate reported in Google Webmaster Tools.
Yes, asking for a page fetch then submitting with linked pages for each of the main website sections can help speed up the crawl discovery. In addition, make sure you've submitted a current sitemap and it's getting found correctly (also reported in GWT) You should also do the same in Bing Webmaster Tools. Too many sites forget about optimizing for Bing - even if it's only 20% of Google's traffic, there's no point throwing it away.
Lastly, earning some new links to different sections of the site is another great signal. This can often be effectively & quickly done using social media - especially Google+ as it gets crawled very quickly.
As far as your other question - yes, once you get the unwanted URLs out of the index, you can add the robots.txt disallow back in to optimise your crawl budget. I would strongly recommend you leave the meta-robots no-index tag in place though as a "belt & suspenders" approach to keep pages linking into those unwanted pages from triggering a re-indexing. It's OK to have both in place as long as the de-indexing has already been accomplished, as we've discussed.
Hope that answer your questions?
Paul
-
So once Google has started to see the meta-noindex and is slowly deindexing pages, once that is done, I would like to block it from crawling them with a robots.txt to conserve my crawl budget.
But, there are still internal links on the site that point to these URL´s - would they get back into the index in this case?
-
Hi Paul,
Thank you for your detailed answer - so I'm not going crazy

I did try with canonicals but then realized they are more of a suggestion as opposed to a directive and I am still correcting a lot of dupe content and 404's so I am imagining that Google view's the site as "these guys don't know what they are doing' so may have ignored the canonical suggestion.
So what I have done is remove the robots block on the pages I want de-indexed and add in meta noindex, follow on these pages - From what you are saying, they should naturally de-index, after which, I will put the robots.txt block back on to keep my crawl budget spent on better areas of the site.
How long in your opinion can it take for Googlebot to de-index the pages? Can I help it along at all to speed up? Fetch page and linking pages as Googlebot?
Thanks again,
Ben
-
You're right to be confused, B. The terminology is unfortunate and misleading.
To answer your questions
1. Yes
2. Yes.
A disallow in robots.txt does nothing to remove already-indexed pages. That's not its purpose. Its only purpose is to tell the search crawlers not to waste their time crawling those pages. Even if pages have been blocked in robots, they will remain in the index if already there. Even if never crawled, and blocked in robots.txt, they can still end up indexed if some other indexed page links to them and the crawlers find those pages by following links. Again, nothing in a robots.txt disallow tells the engines to remove a page from the index, just not to waste time crawling it.
Put another way, the robots.txt disallow directive only disallows crawling - it says nothing about what to do if the page gets into the index in other ways.
The meta-robots no-index tag however explicitly states to the crawler "if you arrive at this page, do not add it to the index. If it is already in the index, remove it".
And yea - as you suspected - if pages are blocked in robots.txt, the crawler obeys and doesn't visit those pages So it can't discover the no-index command to drop them from the index. Thus the only way a page could get dropped is if a crawler followed a link from an external site and discovered the page that way. A very inefficient way of trying to get all those pages out of the index.
Bottom line - robots.txt is never the correct tool to deal with duplicate content issues. It's sole purpose is to keep the crawlers from wasting time on unimportant pages so they can spend more time finding (and therefore indexing) more important pages.
The three tools for dealing with duplicate content are meta-robots no-index tags in a page header, 301 redirects, and canonical tags. Which one to use depends on the architecture of your site, your intended purpose, and the site's technical limitations.
Hope that makes sense?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Near Duplicate Title Tag Checker
Hi Everyone, I know there are a lot of tools like Siteliner, which can check the uniqueness of body copy, but are there any that can restrict the check to the title tags alone? Alternatively, is there an Excel or Google Sheets function that would allow me to do the same thing? Thanks, Andy
Intermediate & Advanced SEO | | AndyRSB0 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
Are ALL CAPS construed as spamming if they are used in a meta description tag call to action?
I know this seems like an old school question. As a long time SEO I would never use ALL CAPS in a title tag (unless a brand name is capitalized). However I recently came across a Moz video about creating better calls to action in the meta description tags. Some of the examples had CTAs that were using all caps (i.e. CALL NOW! or LOWEST QUOTES!) I realize there is a debate about the user experience implications. However I'm more concerned about search engines penalizing websites that are using ALL CAPS CTAs in their meta description tags. Any feedback/advice would be appreciated. Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
Google Ignoring Canonical Tag for Hundreds of Sites
Bazaar Voice provides a pretty easy-to-use product review solution for websites (especially sites on Magento): https://www.magentocommerce.com/magento-connect/bazaarvoice-conversations-1.html If your product has over a certain number of reviews/questions, the plugin cuts off the number of reviews/questions that appear on the page. To see the reviews/questions that are cut off, you have to click the plugin's next or back function. The next/back buttons' URLs have a parameter of "bvstate....." I have noticed Google is indexing this "bvstate..." URL for hundreds of sites, even with the proper rel canonical tag in place. Here is an example with Microsoft: http://webcache.googleusercontent.com/search?q=cache:zcxT7MRHHREJ:www.microsoftstore.com/store/msusa/en_US/pdp/Surface-Book/productID.325716000%3Fbvstate%3Dpg:8/ct:r+&cd=2&hl=en&ct=clnk&gl=us My website is seeing hundreds of these "bvstate" urls being indexed even though we have a proper rel canonical tag in place. It seems that Google is ignoring the canonical tag. In Webmaster Console, the main source of my duplicate titles/metas in the HTML improvements section is the "bvstate" URLs. I don't necessarily want to block "bvstate" in the robots.txt as it will prohibit Google from seeing the reviews that were cutoff. Same response for prohibiting Google from crawling "bvstate" in Paramters section of Webmaster Console. Should I just keep my fingers crossed that Google honors the rel canonical tag? Home Depot is another site that has this same issue: http://webcache.googleusercontent.com/search?q=cache:k0MBLFcu2PoJ:www.homedepot.com/p/DUROCK-Next-Gen-1-2-in-x-3-ft-x-5-ft-Cement-Board-172965/202263276%23!bvstate%3Dct:r/pg:2/st:p/id:202263276+&cd=1&hl=en&ct=clnk&gl=us
Intermediate & Advanced SEO | | redgatst1 -
Proper Title Tags for ecommerce
In terms of E-commerce title tags. We are a manufacturer of our own clothing products. We are new to the SEO landscape so if this question is an obvious answer, then i apologize for wasting any one times in advance. We Manufacture our own clothing. Each item has a name. The names are American womens names such as amanda or lori or jenniffer etc. When we create the title tag for them should we include the name of the item itself at the beginning or end. Â For example should it be Item Name - Keyword - Keyword - Brand Name(aka manufacturer) or Keyword - Keyword - Item Name - Brand Name (aka manufacturer) The reason we ask this is because we think it would be a waste to rank for actual American names such as Jennifer and Jessica. All that we have read on Moz suggests that it seems to be better to have pertinent keywords in the beginning of the title as opposed to the end. In terms of our brand name we already rank number 1 for every combination of our brand. So we would like to start picking up traffic for the different product types we sell and there respective synonyms. Not sure if i am making any sense. Sorry in advance, and any help is very very much appreciated.
Intermediate & Advanced SEO | | Imagination0 -
Is it alright to repeat a keyword in the title tag?
I know at first glance, the answer to this is a resounding NO, that it can be construed as keyword stuffing,
Intermediate & Advanced SEO | | MIGandCo
but please hear me out. I am working on optimizing a client's website and although MOST of the title tags
can be optimized without repeating a keyword, occasionally I run into one where it doesn't read right if I
don't repeat the keyword. Here's an example: Current title:
Photoshop on the Cloud | Adobe Photoshop Webinars | Company Name What I am considering using as the optimized title:
Adobe Photoshop on the Cloud | Adobe Photoshop Webinars | Company Name Yes, I know both titles are longer than recommended. In both instances, only the company name gets
truncated so I am not too worried about that. So I guess what I want to know is this: Am I right in my original assumption that it is NEVER okay to
repeat keywords in a title tag or is it alright when it makes sense to do so?0 -
H3 Tags - Should I Link to my content Articles- ? And do I have to many H3 tags/ Links as it is ?
Hello All, On my ecommerce landing pages, I currently have links to my products as H3 Tags. I also have useful guides displayed on the page with links useful articles we have written (they currently go to my news section). I am wondering if I should put those article links as additional H3 tags as well for added seo benefit  or do I have to many tags as it is ?. A link to my Landing Page I am talking about is - http://goo.gl/h838RW Screenshot of my h1-h6 tags - http://imgur.com/hLtX0n7 I enclose screenshot my guides and also of my H1-H6 tags. Any advice would be greatly appreciated. thanks Peter
Intermediate & Advanced SEO | | PeteC120 -
Canonical tag - but Title and Description are slightly different
I am building a new SEO site with a "Silo" / Themed architecture.  I have a travel website selling hotel reservations.  I list a hotel page under a city page - example, www.abc.com/Dallas/Hilton.html   Then I use that same property under a segment within the city - example www.abc.com/Dallas/Downtown/Hilton.html, so there are two URLs with the same content Both pages are identical, except I want to customize the Title and Description.  I want to customize the title and description to build a consistent theme - for example the /Downtown/Hilton page will have the words "Near Downtown" in the Title and Description, while the primary city Hilton page will not.  So I have two questions about this. First, is it okay to use a canonical tag if the Title and Description are slightly different?  Everything else is identical. If so, will Google crawl and comprehend the unique Title and Description on the "Downtown" silo? I want Google to see that I have several "supporting" pages to my main landing page(s).  I want to present to Google 5 supporting pages in each silo that each has a supporting keyword theme.  But I'm not sure if Google will consider content of pages that point to a different page using the canonical tag. Please see this supporting example:  http://d.pr/i/aQPv Thanks for your insights. Rob
Intermediate & Advanced SEO | | partnerf0