Robots exclusion

Horizon

Hi All,

I have an issue whereby print versions of my articles are being flagged up as "duplicate" content / page titles.

In order to get around this, I feel that the easiest way is to just add them to my robots.txt document with a disallow. Here is my URL make up:

Normal article: www.mysite.com/displayarticle=12345

Print version of my article www.mysite.com/displayarticle=12345&printversion=yes

I know that having dynamic parameters in my URL is not best practise to say the least, but I'm stuck with this for the time being... My question is, how do I add just the print versions of articles to my robots file without disallowing articles too? Can I just add the parameter to the document like so?

Disallow: &printversion=yes

I also know that I can do add a meta noindex, nofollow tag into the head of my print versions, but I feel a robots.txt disallow will be somewhat easier...

Many thanks in advance.

Matt

ShaMenz

Hi Matt,

I would agree 100% with Ryan's comments on robots.txt.

In addition to this, it is a notoriously unreliable method for blocking...if the crawler happens to come to your site via an external link to any page other than the home page it will not see robots.txt.

Sha.

RyanKent

I also know that I can do add a meta noindex, nofollow tag into the head of my print versions, but I feel a robots.txt disallow will be somewhat easier...

A simple rule of SEO. Never ever ever use your robots.txt file to block a page unless there is no other reasonable means of blocking the page.

Use the "noindex" meta tag on your print pages. Do not use the "noindex, nofollow" tag as you are then telling search engines not to trust links to your own site which is a bad move.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots exclusion

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Do I need a separate robots.txt file for my shop subdomain?

Should I block Map pages with robots.txt?

How to solve the meta : A description for this result is not available because this site's robots.txt. ?

BEST Wordpress Robots.txt Sitemap Practice??

Robots.txt Showing in SERP Results

I am trying to block robots from indexing parts of my site..

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?

Is it terrible to not have robots.txt ?