Restricted by robots.txt does this cause problems?
-
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools
Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
-
Hello Ocelot,
I am assuming you have a site that has affiliate links and you want to keep Google from crawling those affiliate links. If I am wrong, please let me know. Going forward with that assumption then...
That is one way to do it. So perhaps you first send all of those links through a redirect via a folder called /out/ or /links/ or whatever, and you have blocked that folder in the robots.txt file. Correct? If so, this is how many affiliate sites handle the situation.
I would not rely on rel nofollow alone, though I would use that in addition to the robots.txt block.
There are many other ways to handle this. For instance, you could make all affilaite links javascript links instead of href links. Then you could put the javascript into a folder called /js/ or something like that, and block that in the robots.txt file. This works less and less now that Google Preview Bot seems to be ignoring the disallow statement in those situations.
You could make it all the same URL with a unique identifyer of some sort that tells your database where to redirect the click. For example:
www.yoursite.com/outlink/mylink#123
or
www.yoursite.com/mylink?link-id=123
In which case you could then block /mylink in the robots.txt file and tell Google to ignore the link-ID parameter via Webmaster Tools.
As you can see, there is more than one way to skin this cat. The problem is always going to be doing it without looking like you're trying to "fool" Google - because they WILL catch up with any tactic like that eventually.
Good luck!
Everett
-
From a coding perspective, applying the nofollow to the links is the best way to go.
With the robots.txt file, only the top tier search engines respect the information contained within, so lesser known bots or spammers might check your robots.txt file to see what you don't want listed, and that info will give them a starting point to look deeper into your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Problem with duplicate pages due to mobile site.
Hey everyone, We've got an issue where our current shopping cart provider (Volusion) allows us to use canonical and rel="alternate" links, however the canonical links are forced on our Desktop as well as mobile pages. When they should only be on the mobile pages. You can view what I mean at the below two pages: http://www.absoluteautomation.ca/fgd400-sensaphone400-p/fgd400.htm https://www.absoluteautomation.ca/mobile/Product.aspx?ProductCode=FGD400 Does anyone have any ideas in terms of working around this?
Technical SEO | | absoauto0 -
Is there any problem with my information structure?
Hey guyz I have a client who got a very interesting structure that I've ever seen. He has got a navigation down link, and with that he links every page on his site , with his every each page.
Technical SEO | | atakala
I mean each page link each page with dropdown navigational menu. ( Menu can be crawled .)
And the other interesting thing is in the image . http://prntscr.com/3q7zp6 He has a level 1 page that has a huge content in it.
But he links every topic of the content with another link which is anchor text link I mean this (http://site.com/level2page.html#part1).
How Google treats this ?
Is there anything wrong with it ? I mean amount of it .
Thank you! 3q7zp60 -
A problem with duplicate content
I'm kind of new at this. My crawl anaylsis says that I have a problem with duplicate content. I set the site up so that web sections appear in a folder with an index page as a landing page for that section. The URL would look like: www.myweb.com/section/index.php The crawl analysis says that both that URL and its root: www.myweb.com/section/ have been indexed. So I appear to have a situation where the page has been indexed twice and is a duplicate of itself. What can I do to remedy this? And, what steps should i take to get the pages re-indexed so that this type of duplication is avoided? I hope this makes sense! Any help gratefully received. Iain
Technical SEO | | iain0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Problem wth Crawling
Hello, I have a website http://digitaldiscovery.eu here in SEOmoz. Its strange since the last week SEOmoz is crawling only one page! And before it was crwaling all the pages. Whats happening? Help SEOmoz! :))
Technical SEO | | PedroM0 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0 -
Restricted by robots.txt and soft bounce issues (related).
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this. Any help? Thanks, Libby
Technical SEO | | GristMarketing0