Robots.txt Syntax

RodrigoStockebrand

Does the order of the robots.txt syntax matter in SEO?

For example (are there potential problems with this format):

User-agent: *
Sitemap: 
Disallow: /form.htm
Allow: /

Disallow: /cgnet_directory

dohertyjf

Rodrigo -

Thanks, and thanks for the follow-up. To be honest with you though...I have not seen or experienced anything about this. I tend to follow the suggested rules with code

So my answer is "I don't know". Anyone else know?

I also agree with you on the meta tags. Robots.txt is best used for disallowing folders and such, not pages. For instance, I might do a "Disallow: /admin" in the robots.txt file, but would never block a category page or something to that effect. If I wanted to remove it from the index, I'd also use the meta "noindex,follow" attribute. Good point!

AlgoFreaks

Thanks John- good response. I think the biggest takeaway for me is to know that none of the "dis-order" above will actually cause errors in the file. However, I completely agree with your recommendations as to where the sitemap: should go, and why the allow parameter is unnecessary.

Last question, do you know if the blank line in-between the allow: and second disallow: parameter cause any issues?

side note for those using the robots.txt to block content, also consider the noindex,follow attribute in the META tag as an alternative to save some link value that those pages may be getting.

dohertyjf

Rodrigo -

Good question. The syntax does in fact matter, though not necessarily for SEO rankings. It matters because if you screw up your robots.txt, you can inadvertently disallow your whole site (I did it last week. Not pretty. Blog post forthcoming).

To get to your question, it is usually best to put the "Sitemap: " line at the bottom of the robots.txt, but it is not required to have it there, so far as I know.

You do not need the Allow: / parameter, because if you leave it out, Google assumes that you want everything indexed except what is put in the "Disallow: " lines.

In your case, you are disallowing "http://www.site.com/form.htm" and everything in your cgnet_directory folder. If you want everything in these folders hidden from crawlers...you have done exactly what you need to do.

I'm still learning about this, so I'm open to any correction the rest of the community has.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt Syntax

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Do robot.txts permanently affect websites even after they have been removed?

3,511 Pages Indexed and 3,331 Pages Blocked by Robots

Adding multi-language sitemaps to robots.txt

Google insists robots.txt is blocking... but it isn't.

Google authorship syntax, plus no follow

Help needed with robots.txt regarding wordpress!

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?

Robots.txt and 301