Robots.txt: Syntax URL to disallow

Kuantokusta

Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs?

Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file.

The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand")

Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ?

I don't think so, but thank you to everyone who will be able to help me to clear this out

Anti-Alex

You could inadvertently block /brand/ altogether. Just because you use a // doesn't mean Google follows the same rules when crawling.

Kuantokusta

"I wouldn't risk telling a spider to ignore /brand// because it might have adverse results."

Which adverse results could be expected?

AdAgency

(because of the 404 error pages being constantly found in our pages)

Think of it this way:
Which is better? Re-routing traffic when it's congested or putting up a road block to back up even more traffic?

Yes, it's more work to do the 301 redirects but if you have "pages being constantly found" you should give instructions to spiders to take the different path.

Now, if you are talking about an error such as:
/brand//samsung/13 SHOULD go to
/brand/samsung/13
Then you could EASILY solve this with HTACCESS redirects. I wouldn't risk telling a spider to ignore /brand// because it might have adverse results.

Kuantokusta

Hi guys,

Thank you for your answers

I understand (and agree) with your SEO point of view (301 redirection) but I should have mentioned that these old URLs are leading to a 404 error page for a long time now, we are not considering anymore their SEO strength anymore...

My goal right now is to find a quick and simple way to tell search engines to not consider this type of old URLs (because of the 404 error pages being constantly found in our pages) : doing the 301 redirection to the right page would be a bit more complex at the moment.

So: do you think there is a risk that the second slash won't be "considered" in the robots.txt about the "disallow" line I want to add ? (= do search engines will stop to crawl URLs like "/brand/samsung/13" if I add the line "Disallow: /brand//" ?)

AdAgency

I'll further what Highland and Alex Chan are telling you. If you are using Apache (Linux) then you can redirect your old site links using a 301 redirect and .htaccess which is a very powerful tool. Otherwise, if you are using a IIS server, web.config is what you want to use.

A really good resource for .htassess is CSS-Tricks: http://css-tricks.com/snippets/htaccess/301-redirects/

Anti-Alex

Yup like Highland mentioned, using your robots.txt for this isn't a good idea. The robots.txt file isn't guaranteed to work anyway. The only sure fire way to get it working is to move all the URLs from the old structure to the new one, then 301 all the old URLs into the new URLs. The 301 minimizes loss to your SEO.

Highland

You really don't need a robots for that. I would either 301 the old URL (preferred) or have the old URL return a 404. Both will cause the old URL to be removed from the index. A robots nofollow simply leaves it up but tells the robots not to crawl it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt: Syntax URL to disallow

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

404 broken URLs coming up in Google

URLs with parameters + canonicals + meta robots

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Why is this SERP displaying an incorrect URL for my homepage?

CHange insite Urls structure

Should comments and feeds be disallowed in robots.txt?

Should /node/ URLs be 301 redirect to Clean URLs

Service Keyword in URL - too much?