RegEx help needed for robots.txt potential conflict

MSTJames

I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both:

Allow: /*?p=

and

Disallow: /?p=&

I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number?

I've looked at several resources and there is practically no reference to what "&" does...

Can anyone shed any light on this, to ensure I am allowing suitable access to a shop?

Thanks in advance for any assistance

Marcus_Miller

Hey James

It looks to me like you are just disallowing access to any URLs that have more than the initial p= variable. So, you are reducing the impact of potential duplication through searches and the like.

Good

?p=1

Bad

?p=1&q=search string

I am no magento expert but this seems to be a simple attempt to reduce the myriad duplication that can happen with search pages and the like inside a complex CMS like Magento.

The SEOMoz crawler tool should give you some good insight and to be sure, try removing the 'Disallow: /?p=&' and see if you get a buckletload of duplicate content warnings.

Ultimately, the thing to remember here is that the & is part of the URL and not part of the regex.

Hope that helps!
Marcus

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

RegEx help needed for robots.txt potential conflict

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...

"Url blocked by robots.txt." on my Video Sitemap

Clarification regarding robots.txt protocol

Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?

Block Domain in robots.txt

Rel Canonical ? please help again!

Robots.txt

Robots.txt