Robots.txt file question? NEver seen this command before

RobMay

Hey Everyone!

Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).

the command line is as follows:

Disallow: /*?*

I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me

Any help would be greatly appreciated!

Thanks, Rob

omnea

I don't think this is correct.

? is an attempt at using a RegEx in Robots file which I don't think works.

Further, if it was a properly formed regex, it would be ?

is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.

http://www.robotstxt.org/robotstxt.html

I could be wrong, but the info on this site has been my understanding from the past too.

AdoptionHelp

It depends on how your site is structured.

For example if you have a page at

http://www.yourdomain.com/products.php

and this shows different things based on the parameter, like:

http://www.yourdomain.com/products.php?type=widgets

You will want to get rid of this line in your robots.txt

However if the parameter(s) doesn't change the content on the page, you can leave it in.

RobMay

Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.

rhutchings

Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.

RobMay

So, for this parameter, should I keep it in the robots file?

AdoptionHelp

Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:

http://www.google.com/search?q=seo

This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt file question? NEver seen this command before

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt in subfolders and hreflang issues

One server, two domains - robots.txt allow for one domain but not other?

Question on noscript tags and indexing

Robots.txt

Question/Concern about URL structure

How to allow one directory in robots.txt

Site not being Indexed that fast anymore, Is something wrong with this Robots.txt

Is robots.txt a must-have for 150 page well-structured site?