Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
"noindex, follow" or "robots.txt" for thin content pages
-
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great.
I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
-
trung.ngo - check out this article I posted http://www.blindfiveyearold.com/crawl-optimization
that's where I got my "inspiration" from to consider using robots.txt instead...
-
I am thinking if I exclude more thin pages from being crawled (robots.txt) that may be better than my current "noindex, follow" - the thin pages are already "noindex, follow".
You are saying "unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me." - but how would I know this? Fair to assume for a website with 5,000 pages this is probably not an issue?
I am concerned with the "noindex, follow" Google may think "ahh, we have seen all this stuff before. Thanks for keeping out of our index, but we are still going to devalue your original content indexed pages because we crawl and see all this thin stuff." I am thinking with the robots.txt it would potentially be a stronger signal that could help my indexed pages. Or you think it is a minor and probably not relevant?
-
Hello there,
Have you had any duplicate content or crawling issues in the past or is this more of a preventative measure? If the pages, as you put it, "would not generate relevant search traffic", then I would argue that it'd make sense to "noindex, follow" based on the assumption that the pages are not currently driving search traffic, and have no real potential to contribute significantly to brand discovery via a search engine in the future.
I wouldn't necessarily say that Google crawling your page more frequently would automatically give you a boost in rankings; it's more associated with whether or not they're crawling pages frequently enough to index updates to the pages. So unless there's evidence that the pages are taking up too much of the crawl bandwidth, it doesn't seem like too much of an issue to me.
All of this to say, take a look at the data to see if a real problem exists--whether crawl resources or duplicate content--before doing anything drastic. And, of course, also understand what you'll be losing by making the updates. If you do choose to prevent crawling via robots.txt and are at all concerned with the duplicate/thin content aspect, remember to implement a noindex and confirm that the pages are removed from search results before disallowing in robots.txt--otherwise, they'll remain indexed.
-
Hi Keri, There are some good comments but none really answer this question and that is why I am trying to approach from different angles. Maybe you can shed some light on this:
AJ Kohn wrote this great article: http://www.blindfiveyearold.com/crawl-optimization - he talks about using robots.txt to exclude thin content in order to increase frequency with qhich indexed content gets crawled, supposedly helping rankings. In this great whiteboard Friday, Rand suggests using "noindex, follow" - http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls.I am trying to get more light on this (people who have experience with this), but struggle to get answers.
-
I noticed you had similar questions at http://moz.com/community/q/unique-content-below-fold-better-move-above-fold and http://moz.com/community/q/risk-using-nofollow-tag with several answers each, including some that were marked as Good Answer. Did any of those answers help to answer your question?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Rel="prev" / "next"
Hi guys, The tech department implemented rel="prev" and rel="next" on this website a long time ago.
Intermediate & Advanced SEO | | AdenaSEO
We also added a canonical tag to the 'own' page. We're talking about the following situation: https://bit.ly/2H3HpRD However we still see a situation where a lot of paginated pages are visible in the SERP.
Is this just a case of rel="prev" and "next" being directives to Google?
And in this specific case, Google deciding to not only show the 1st page in the SERP, but still show most of the paginated pages in the SERP? Please let me know, what you think. Regards,
Tom1 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Putting "noindex" on a page that's in an iframe... what will that mean for the parent page?
If I've got a page that is being called in an iframe, on my homepage, and I don't want that called page to be indexed.... so I put a noindex tag on the called page (but not on the homepage) what might that mean for the homepage? Nothing? Will Google, Bing, Yahoo, or anyone else, potentially see that as a noindex tag on my homepage?
Intermediate & Advanced SEO | | Philip-DiPatrizio0 -
Wordpress Tag Pages - NoIndex?
Hi there. I am using Yoast Wordpress Plugin. I just wonder if any test have been done around the effects of Index vs Noindex for Tag Pages? ( like when tagging a word relevant to an article ) Thanks 🙂 Martin
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Pipe ("|") in my website's title is being replaced with ":" in Google results
Hi , One of the websites I'm promoting and working on is www.pau-brasil.co.il.
Intermediate & Advanced SEO | | Kadel
It's wordpress-based website and as you can see the html's Title is "PauBrasil | some hebrew slogan".
(Screenshot: http://i.imgur.com/2f80EEY.gif)
When I'm searching for "PauBrasil" (Which is the brand's name) , one of the results google shows is "PauBrasil: Some Hebrew Slogan" (Screenshot: http://i.imgur.com/eJxNHrO.gif ) Why does the pipe is being replaced with ":" ?
And not just that , as you can see there's a "blank space" missing between the the ":" to the slogan.
(note: the websites has been indexed by google crawler at least 4 times so I find it hard to believe it can be the reason) I've keep on looking and found out that there's another page in that website with the exact same title
but when I'm looking for it in google , it shows the title as it really is , with pipe. ("|").
(Screenshot: http://i.imgur.com/dtsbZV2.gif) Have you ever encountered something like that?
Can it be that the duplicated title cause that weird "replacement"? Thanks in advance,
Kadel0 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12