Robot.txt pattern matching

STPseo

Hola fellow SEO peoples!

Site: http://www.sierratradingpost.com

robot: http://www.sierratradingpost.com/robots.txt

Please see the following line: Disallow: /keycodebypid~*

We are trying to block URLs like this:

http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/

but we still find them in the Google index.

1. we are not sure if we need to specify the robot to use pattern matching.

2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/?

What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" />

Thank you!

SEOSHARK

ok, so not sure sure this was shared. Matt Cutts talking on this same subject.

| | <cite class="kvm">www.youtube.com/watch?v=I2giR-WKUfY</cite> |

STPseo

John, The article was a real eye-opener!Thanks again!

john4math

Somehow Google is finding these pages, but you're disallowing the Googlebot from reading the page, so it doesn't know anything about the meta noindex tag on the page. If you have meta noindex tags on all of these pages, you can remove that line in your robots.txt preventing bots from reading these pages, and as Google crawls these pages, they should remove them from their SERPs.

STPseo

Great point! I will remember that. However I have both the disallow line in the robots.txt file and I also have the noindex meta command. Yet Google shows 3000 of them!?!?!?!

http://www.google.com/search?q=site%3Awww.sierratradingpost.com+keycodebypid

cfguti

Well done John!!!

cfguti

Hi,

then you have the robots.txt and the meta tag. I think its better the metatag (http://www.seomoz.org/learn-seo/robotstxt)

Have you WebMaster Tools in your web? you can test your robots.txt file (http://www.google.com/support/webmasters/bin/answer.py?answer=156449)

john4math

Here's a good SEOMoz post about this: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts. What's most likely happening is that the disallow in robots.txt is preventing the bots from indexing the page, so they're not going to find the meta noindex tag. If people link to one of these pages externally, the disallow in robots.txt does not prevent the page from appearing in search results.

The robots.txt syntax you're using now looks correct to me for what you're trying to do.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robot.txt pattern matching

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Search console says 'sitemap is blocked by robots?

Duplicate content: using the robots meta tag in conjunction with the canonical tag?

Robots file set up

3,511 Pages Indexed and 3,331 Pages Blocked by Robots

Does a country specific TLD implicitly influence the full country name for keyword matching?

Is my robots.txt file working?

Our UE team has presented me with a site structure where the content (folders) does not match the hierarchical directory structure (in the CME)

Robots.txt file question? NEver seen this command before