Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Twitter Robots.TXT
Hello Moz World, So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use: Useragent: * Disallow: But, they address each search engine. Is there any benefit to this? Thanks for all of the awesome responses!!! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
Help with Robots.txt On a Shared Root
Hi, I posted a similar question last week asking about subdomains but a couple of complications have arisen. Two different websites I am looking after share the same root domain which means that they will have to share the same robots.txt. Does anybody have suggestions to separate the two on the same file without complications? It's a tricky one. Thank you in advance.
Intermediate & Advanced SEO | | Whittie0 -
Google not taking Meta...
Hello all, So I understand that Google may sometimes take content from the page as a snippet to display on SERPs rather than the meta description, but my problem goes a little beyond that. I have a section on my site which updates everyday so a lot of the content is dynamics (products for a shop, every morning unique stock is added or removed), and despite having a meta description, title and receiving an 'A' grade in the MOZ on page grader, these pages never show up in Google. After a little research I did a 'site:www.mysite.com/productpage' in Google and this indeed listed all my products, but interestingly for every single one Google had taken the copyright notice at the bottom of the page as the snippet instead of the meta or any H1, H2 or P text on the page... Does anyone have any idea why Google is doing this? It would explain a lot to me in terms of overall traffic, I'm just out of ideas... Thanks!
Intermediate & Advanced SEO | | HB170 -
Meta Keywords Good or Bad
Hi All, I've been reading more about the meta keyword tag and why it may not be a good idea to include them on pages and am looking for thoughts/feedback on this idea. If you have employed this tactic, can you give me some insight into any results you saw. If you decided to not employ this tactic, why did you choose not to? I wan to understand all sides of this before employing any changes to my company's websites. Thank you for your help!
Intermediate & Advanced SEO | | airnwater0 -
Will blocking urls in robots.txt void out any backlink benefits? - I'll explain...
Ok... So I add tracking parameters to some of my social media campaigns but block those parameters via robots.txt. This helps avoid duplicate content issues (Yes, I do also have correct canonical tags added)... but my question is -- Does this cause me to miss out on any backlink magic coming my way from these articles, posts or links? Example url: www.mysite.com/subject/?tracking-info-goes-here-1234 Canonical tag is: www.mysite.com/subject/ I'm blocking anything with "?tracking-info-goes-here" via robots.txt The url with the tracking info of course IS NOT indexed in Google but IT IS indexed without the tracking parameters. What are your thoughts? Should I nix the robots.txt stuff since I already have the canonical tag in place? Do you think I'm getting the backlink "juice" from all the links with the tracking parameter? What would you do? Why? Are you sure? 🙂
Intermediate & Advanced SEO | | AubieJon0 -
Do Google use HTTPS as a trust indicator?
Scenario: Two sites, exactly the same with a form to capture customer details on the home page (e.g. name, address). Would Google rank a site that uses HTTPS over a site that uses HTTP? From what I've heard, they would trust the HTTPS site more than HTTP and therefore rank it higher. Forum opinions?
Intermediate & Advanced SEO | | PeterAlexLeigh0 -
How often do you refresh meta descriptions? And does refreshing meta descriptions help in ranking?
1. And does refreshing meta descriptions help in ranking? 2. How often do you refresh meta descriptions? I am not sure whether it's a new feature, but I noticed this little time stamp on one of the search results, it says 1 day ago, and the website ranked quite high. So is there any correlation here? o4JIw.png
Intermediate & Advanced SEO | | robotseo0