Robot.txt pattern matching
-
Hola fellow SEO peoples!
Site: http://www.sierratradingpost.com
robot: http://www.sierratradingpost.com/robots.txt
Please see the following line: Disallow: /keycodebypid~*
We are trying to block URLs like this:
http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/
but we still find them in the Google index.
1. we are not sure if we need to specify the robot to use pattern matching.
2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/?
What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" />
Thank you!
-
ok, so not sure sure this was shared. Matt Cutts talking on this same subject.
|
| <cite class="kvm">www.youtube.com/watch?v=I2giR-WKUfY</cite> |
-
John, The article was a real eye-opener!Thanks again!
-
Somehow Google is finding these pages, but you're disallowing the Googlebot from reading the page, so it doesn't know anything about the meta noindex tag on the page. If you have meta noindex tags on all of these pages, you can remove that line in your robots.txt preventing bots from reading these pages, and as Google crawls these pages, they should remove them from their SERPs.
-
Great point! I will remember that. However I have both the disallow line in the robots.txt file and I also have the noindex meta command. Yet Google shows 3000 of them!?!?!?!
http://www.google.com/search?q=site%3Awww.sierratradingpost.com+keycodebypid
-
Well done John!!!
-
Hi,
then you have the robots.txt and the meta tag. I think its better the metatag (http://www.seomoz.org/learn-seo/robotstxt)
Have you WebMaster Tools in your web? you can test your robots.txt file (http://www.google.com/support/webmasters/bin/answer.py?answer=156449)
-
Here's a good SEOMoz post about this: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts. What's most likely happening is that the disallow in robots.txt is preventing the bots from indexing the page, so they're not going to find the meta noindex tag. If people link to one of these pages externally, the disallow in robots.txt does not prevent the page from appearing in search results.
The robots.txt syntax you're using now looks correct to me for what you're trying to do.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing Development Site Despite Robots.txt Block
Hi, A development site that has been set-up has the following Robots.txt file: User-agent: * Disallow: / In an attempt to block Google indexing the site, however this isn't the case and the development site has since been indexed. Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | CarlWint0 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Robots.txt & Mobile Site
Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger0 -
"Extremely high number of URLs" warning for robots.txt blocked pages
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system: http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search. User-agent: * Disallow: /trackingredirect However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed. If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
Technical SEO | | EhrenReilly0 -
Meta-robots Nofollow
I don't understand Meta-robots Nofollow. Wordpress has my homepage set to this according to SEOMoz tool. Is this really bad?
Technical SEO | | hopkinspat1 -
Robots.txt Question
In the past, I had blocked a section of my site (i.e. domain.com/store/) by placing the following in my robots.txt file: "Disallow: /store/" Now, I would like the store to be indexed and included in the search results. I have removed the "Disallow: /store/" from the robots.txt file, but approximately one week later a Google search for the URL produces the following meta description in the search results: "A description for this result is not available because of this site's robots.txt – learn more" Is there anything else I need to do to speed up the process of getting this section of the site indexed?
Technical SEO | | davidangotti0 -
How many times robots.txt gets visited by crawlers, especially Google?
Hi, Do you know if there's any way to track how often robots.txt file has been crawled? I know we can check when is the latest downloaded from webmaster tool, but I actually want to know if they download every time crawlers visit any page on the site (e.g. hundreds of thousands of times every day), or less. thanks...
Technical SEO | | linklater0 -
Redirecting a old aged site to a new exact match site?
Hi All, I have a question. I have 2 sites with me in the same sector and want some help. site 1 is a old site started back in 2003 and has some amount of links to it and has a pr 3 with some good links to it but doesn't rank much for any keywords for the timing. site 2 is a aged domain but newly developed with unique content and has a good amount of exact match with a .com version. so will there be any benefit by redirecting site 1 to site 2 to get the seo benefits and a start for link bulding? or is it best to develop and work on each site? the sector is health insurance. Thanks
Technical SEO | | macky71