Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
Changing domains - best process to use?
I am about to move my Thailand-focused travel website into a new, broader Asia-focused travel website. The Thailand site has had a sad history with Google (algorithmic, not penalties) so I don't want that history to carry over into the new site. At the same time though, I want to capture the traffic that Google is sending me right now and I would like my search positions on Bing and Yahoo to carry through if possible. Is there a way to make all that happen? At the moment I have migrated all the posts over to the new domain but I have it blocked to search engines. I am about to start redirecting post for post using meta-refresh redirects with a no-follow for safety. But at the point where I open the new site up to indexing, should I at the same time block the old site from being indexed to prevent duplicate content penalties? Also, is there a method I can use to selectively 301 redirect posts only if the referrer is Bing or Yahoo, but not Google, before the meta-refresh fires? Or alternatively, a way to meta-refresh redirect if the referrer is Google but 301 redirect otherwise? Or is there a way to "noindex, nofollow" the redirect only if the referrer is Google? Is there a danger of being penalised for doing any of these things? Late Edit: It occurs to me that if my penalties are algorithmic (e.g. due to bad backlinks), does 301 redirection even carry that issue through to the new website? Or is it left behind on the old site?
Intermediate & Advanced SEO | | Gavin.Atkinson0 -
Problems with a NoIndex NoFollow Site
For legal reasons my website is going to launch non-branded websites. We do not have the capacity to make these site sufficiently unique from the main site so we are planning on having them be NoIndex NoFollow. Are there any potential SEO problems here? What will the implication be if in ~1-2 years from launching the NoIndex NoFollow we make the site unique, take away the tag and want to start promoting these sites organically. Thanks!
Intermediate & Advanced SEO | | theLotter0 -
MOZ crawl report says category pages blocked by meta robots but theyr'e not?
I've just run a SEOMOZ crawl report and it tells me that the category pages on my site such as http://www.top-10-dating-reviews.com/category/online-dating/ are blocked by meta robots and have the meta robots tag noindex,follow. This was the case a couple of days ago as I run wordpress and am using the SEO Category updater plugin. By default it appears it makes categories noindex, follow. Therefore I edited the plugin so that the default was index, follow as I want google to index the category pages so that I can build links to them. When I open the page in a browser and view source the tags show as index, follow which adds up. Why then is the SEOMOZ report telling me they are still noindex,follow? Presumably the crawl is in real time and should pick up the new follow tag or is it perhaps because its using data from an old crawl? As yet these pages aren't indexed by google. Any help is much appreciated! Thanks Sam.
Intermediate & Advanced SEO | | SamCUK0 -
How many time should a keyword be used in the body of text?
We employee an outside agency to write content for our website as we do not have the ability in house to write unique and good quality content. They have just sent an article which is around 300 words. I told them the keyword phrases to use. When I got the document there is only 1 instance of the keyword phrase(s) in it. Now there seems to be a conflict here amongst posts I have read and general SEO advise as to how many times it should be present (SEOmoz indicates 4 times for instance), our outside agency says it doesn't matter. Now if I have a page optimised for 2 keywords this starts making things tricky and probably looks keyword stuffed to the reader. Assuming the keywords are present once in meta tags, H1, meta descriptions and alt text, what do people think is best practice taking into account recent panda updates? Thoughts appreciated. Thanks Craig
Intermediate & Advanced SEO | | Towelsrus0 -
All In One SEO PACK Configuration - Index or Noindex?
I'm finding conflicting information about the right way to configure the All in One SEO Pack wordpress plugin. Do I index or noindex for the items below? Use noindex for Categories - yes or no? Use noindex for Archives - yes or no? Use noindex for Tag Archives - yes or no?
Intermediate & Advanced SEO | | webestate0 -
When using ALT tags - are spaces, hyphens or underscores preferred by Google when using multiple words?
when plugging ALT tags into images, does Google prefer spaces, hyphens, or underscores? I know with filenames, hyphens or underscores are preferred and spaces are replaced with %20. Thoughts? Thanks!
Intermediate & Advanced SEO | | BrooklynCruiser3 -
How can I check if the FOLLOW,NOINDEX tag is working?
Hi everyone! After reading about pagination practices, a few days ago we introduced the <meta name="robots" content="FOLLOW,NOINDEX" /> tag, to prevent duplicate content. You can find an example below: http://www.inmonova.com/en/properties?page=2 I have been checking yahoo site explorer and result pages still get indexed. My question is: Am I doing something wrong? Is the code incorrect (follow,noindex - noindex,follow)? Or does it just take some time to have effect? Thanks in advance.
Intermediate & Advanced SEO | | inmonova0