Restricted by robots.txt does this cause problems?
-
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools
Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
-
Hello Ocelot,
I am assuming you have a site that has affiliate links and you want to keep Google from crawling those affiliate links. If I am wrong, please let me know. Going forward with that assumption then...
That is one way to do it. So perhaps you first send all of those links through a redirect via a folder called /out/ or /links/ or whatever, and you have blocked that folder in the robots.txt file. Correct? If so, this is how many affiliate sites handle the situation.
I would not rely on rel nofollow alone, though I would use that in addition to the robots.txt block.
There are many other ways to handle this. For instance, you could make all affilaite links javascript links instead of href links. Then you could put the javascript into a folder called /js/ or something like that, and block that in the robots.txt file. This works less and less now that Google Preview Bot seems to be ignoring the disallow statement in those situations.
You could make it all the same URL with a unique identifyer of some sort that tells your database where to redirect the click. For example:
www.yoursite.com/outlink/mylink#123
or
www.yoursite.com/mylink?link-id=123
In which case you could then block /mylink in the robots.txt file and tell Google to ignore the link-ID parameter via Webmaster Tools.
As you can see, there is more than one way to skin this cat. The problem is always going to be doing it without looking like you're trying to "fool" Google - because they WILL catch up with any tactic like that eventually.
Good luck!
Everett
-
From a coding perspective, applying the nofollow to the links is the best way to go.
With the robots.txt file, only the top tier search engines respect the information contained within, so lesser known bots or spammers might check your robots.txt file to see what you don't want listed, and that info will give them a starting point to look deeper into your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
IP Redirect causing Indexing Issue
Hi, I am trying to redirect any IP from outside India that comes to Store site (https://store.nirogam.com/) to Global Store site (https://global.nirogam.com/) using this methodThis is causing various indexing issues for Store site as Googlebot from US also gets redirected!- Very few pages for "store.nirogam.com/products/" are being indexed. Even after submission of sitemap it indexed ~50 pages and then went back to 1 page etc. Only ~20 pages indexed for now.- After this I tried manually indexing via "Crawl -> Fetch as Google" - but then it showed me a redirect to global.nirogam.com. All have their "status -> Redirected" - This is why bots are not able to index the site.What are possible solutions for this? How can we tell bots to index these pages and not get redirected?Will a popup method where we ask user if they are outside India help in solving this issue?All approaches/suggestions will be highly appreciated.
Technical SEO | | pks3330 -
Problems with canonical urls / redirect (magento webshop)
Hi all, We're running a Magento webshop and we discover some strangs things regarding canonical urls and redirects after using the Amasty improved navigation extension. To clarify, please check these four urls. They contain the same content (the same product page). https://www.afwerkingshop.be/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/standaard/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html All these four pages have different canoncials (the page url). Obviously, that's not good. However, in Google (site:...) url (1) is the only one that's indexed. Thereby, if I visit the productpage by first going to a category page (fe. www.afwerkingshop.be/wanden.html), I'm redirected to url (1), but the canonical url is www.afwerkingshop.be/last_visited_category_name/product. So, the canonical seems dynamic depending on the last visited category. And still, only url (1) is indexed. Additionally, all aforementioned pages contain . Is anyone familiar with this issue? And more important, will it cause problems in future? Thanks in advance. Kind regards, Chendon
Technical SEO | | RBijsterveld0 -
Doctype language declaration problem
Hello,
Technical SEO | | Silviu
I have a problem with an SEM Rush warning on a website audit, for www.enjoyprepaid.com. It tells me "5852 pages are lacking language declaration", but I don't understand what it means and how to actually fix this problem. Also I run a W3 validator and have a doctype and language problem but again don't understand what they mean and how to fix them https://validator.w3.org/nu/?doc=http%3A%2F%2Fwww.enjoyprepaid.com%2FAfghanistan-calling-cards-2.html0 -
Why is robots.txt blocking URL's in sitemap?
Hi Folks, Any ideas why Google Webmaster Tools is indicating that my robots.txt is blocking URL's linked in my sitemap.xml, when in fact it isn't? I have checked the current robots.txt declarations and they are fine and I've also tested it in the 'robots.txt Tester' tool, which indicates for the URL's it's suggesting are blocked in the sitemap, in fact work fine. Is this a temporary issue that will be resolved over a few days or should I be concerned. I have recently removed the declaration from the robots.txt that would have been blocking them and then uploaded a new updated sitemap.xml. I'm assuming this issue is due to some sort of crossover. Thanks Gaz
Technical SEO | | PurpleGriffon0 -
Blocked by robots
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags his former seo told him this: "We've activated following settings: Use noindex for Categories Use noindex for Archives Use noindex for Tag Archives to reduce keyword stuffing & duplicate post tags
Technical SEO | | Ezpro9
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. " is this guy correct? what would be the problem with indexing these? am i correct in thinking they should be indexed? thanks0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
Multiple redirects a problem?
When product is sold out I will 301 redirect to a category page if a similar product is not available, but now our web developer has changed all the url's of the category pages so I need to redirect them all to the new category pages but that means there are some products that are first being redirected to the no longer existent category and then being redirected again to the new category page. This seems like it might me be a problem having two 301 redirects so I wanted to find out for sure if it is. Unfortunately our system for redirecting pages is archaic so it will be difficult and time consuming to go back and redo all the redirects that are going to pages that no longer exist so I wanted to get some additional opinions before I do that.
Technical SEO | | KentH0 -
Robots txt
We have a development site that we want google and other bots to stay out of but we want roger to have access. Currently our robots.txt looks like this: User-agent: *
Technical SEO | | LadyApollo
Disallow: /cgi-bin/
Disallow: /development/ What would i need to addd or change to let him through? Thank you.0