Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Translating meta tags using WPML and AIO SEO
Having a heck of a time finding info on this one... We're working on a multilingual website which uses WPML. I've used the All in One SEO plugin to customize meta data (title, description, etc). These strings do not appear in the list of translations in WPML. Does anyone have any experience with this setup? How do you enable WPML to translate meta data set via the AIO plugin? Thanks!
Intermediate & Advanced SEO | | jonmc0 -
Using disavow tool for 404s
Hey Community, Got a question about the disavow tool for you. My site is getting thousands of 404 errors from old blog/coupon/you name it sites linking to our old URL structure (which used underscores and ended in .jsp). It seems like the webmasters of these sites aren't answering back or haven't updated their sites in ages so it's returning 404 errors. If I disavow these domains and/or links will it clear out these 404 errors in Google? I read the GWT help page on it, but it didn't seem to answer this question. Feel free to ask any questions that may help you understand the issue more. Thanks for your help,
Intermediate & Advanced SEO | | IceIcebaby
-Reed0 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Canonical tag + HREFLANG vs NOINDEX: Redundant?
Hi, We launched our new site back in Sept 2013 and to control indexation and traffic, etc we only allowed the search engines to index single dimension pages such as just category, brand or collection but never both like category + brand, brand + collection or collection + catergory We are now opening indexing to double faceted page like category + brand and the new tag structure would be: For any other facet we're including a "noindex, follow" meta tag. 1. My question is if we're including a "noindex, follow" tag to select pages do we need to include a canonical or hreflang tag afterall? Should we include it either way for when we want to remove the "noindex"? 2. Is the x-default redundant? Thanks for any input. Cheers WMCA
Intermediate & Advanced SEO | | WMCA0 -
Using pictures from another domain
We are building several sites for several clients which will be using images from the manufacturer. Our dev team wants to insert the manufacturer's url for the images, instead of actually downloading the image and hosting on our server. There are thousands of images, so downloading images to our server will be time consuming, so we are looking for a shortcut.... however I'm concerned this will cause other issues. Is using manufactueresdomain.com/12345.jpg going to cause SEO issues? will this generate Google penalties? Since we are not able to control the image file name, we cannot optimize it. We will add Alt text and Title tag for each image, but the file name is random characters. How important is the file name for SEO?
Intermediate & Advanced SEO | | Branden_S0 -
Noindex : Do Follow or No Follow Tags?
Hello, I have a website with tags (which have the noindex tag) on each article post. I've been told that I should noindex/nofollow these tag pages, because they are getting link juice passed to them, and since they aren't getting indexed, it's wasting link juice to those pages, when the link juice could be passed to a page that is actually getting indexed. What are your thoughts on this? Also, what would be the point to noindex/follow a page, if you are noindexing that page? Isn't it just wasting link juice? What is the proper SEO way to optimize tags.
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
NoIndexing Massive Pages all at once: Good or bad?
If you have a site with a few thousand high quality and authoritative pages, and tens of thousands with search results and tags pages with thin content, and noindex,follow the thin content pages all at once, will google see this is a good or bad thing? I am only trying to do what Google guidelines suggest, but since I have so many pages index on my site, will throwing the noindex tag on ~80% of thin content pages negatively impact my site?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0