Best practice for disallowing URLS with Robots.txt

centurysafety

Hi Everybody,

We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:

Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
Color - For example: ?color=91
Price - For example: "?price=650-700"
Order - For example: ?dir=desc&order=most_popular
Page - For example: "?p=1&standards=704"
Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"

My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?

Any advice would be appreciated!

TimHolmes

If you are looking to disallow url parameters you could use something like the following as a convention.

Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.

Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.

JordanLowry

First I assume you have webmaster tools set up?

They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.

If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.

Let me know if you run into any problems.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best practice for disallowing URLS with Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Only Indexing Canonical Root URL Instead of Specified URL Parameters

Is single H1 tag still best practice?

Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?

Best practices for structuring an ecommerce site

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Overly-Dynamic URLs & Changing URL Structure w Web Redesign

What would be the best domain choice?

Is it safe to redirect multiple URLs to a single URL?