Best practice for disallowing URLS with Robots.txt

centurysafety

Hi Everybody,

We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:

Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
Color - For example: ?color=91
Price - For example: "?price=650-700"
Order - For example: ?dir=desc&order=most_popular
Page - For example: "?p=1&standards=704"
Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"

My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?

Any advice would be appreciated!

TimHolmes

If you are looking to disallow url parameters you could use something like the following as a convention.

Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.

Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.

JordanLowry

First I assume you have webmaster tools set up?

They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.

If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.

Let me know if you run into any problems.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best practice for disallowing URLS with Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?

Splitting One Site Into Two Sites Best Practices Needed

Default Robots.txt in WordPress - Should i change it??

Disallow URLs ENDING with certain values in robots.txt?

Block subdomain directory in robots.txt

Issue with Robots.txt file blocking meta description

Two homepage urls

Best practice?