Search Engine Blocked by Robot Txt warnings for Filter Search result pages--Why?
-
Hi,
We're getting 'Yellow' Search Engine Blocked by Robot Txt warnings for URLS that are in effect product search filter result pages (see link below) on our Magento ecommerce shop. Our Robot txt file to my mind is correctly set up i.e. we would not want Google to index these pages. So why does SeoMoz flag this type of page as a warning? Is there any implication for our ranking? Is there anything we need to do about this? Thanks.
Here is an example url that SEOMOZ thinks that the search engines can't see.
http://www.site.com/audio-books/audio-books-in-english?audiobook_genre=132
Below are the current entries for the robot.txt file.
User-agent: Googlebot
Disallow: /index.php/
Disallow: /?
Disallow: /.js$
Disallow: /.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /utm
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap: -
Thanks Keri for your advice
-
Thanks Rick for your advice
-
Like Rick said, it's just a "hey, make sure that you really wanted to do this" type warning, since you can easily write a robots.txt that blocks things you didn't really think would be blocked. Or someone else can modify the robots.txt without telling you, and this can be a warning that you need to go find someone and get that fixed.
-
So what your saying is:
1. SEOmoz says these pages can't get indexed by search engines because of our robot.txt
2. We don't want these pages indexed and blocked them using robots.txt
My initial reaction is: no problem, SEOmoz is just showing you as a 'confirmation warning' that these pages are not indexed, but since you did that on purpose, it's okay.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Added 301 redirects, pages still earning duplicate content warning
We recently added a number of 301 redirects for duplicate content pages, but even with this addition they are still showing up as duplicate content. Am I missing something here? Or is this a duplicate content warning I should ignore?
Technical SEO | | cglife0 -
Search results indexed
Hi there, is is bad practice in seo to have search results for products indexed? For example a search result of holidays to Ibiza, with lots of deals coming up? its a search query url that would be indexed, with just an image and price per product on the page, with about 10 per page? Any advice appreciated.
Technical SEO | | pauledwards0 -
Timely use of robots.txt and meta noindex
Hi, I have been checking every possible resources for content removal, but I am still unsure on how to remove already indexed contents. When I use robots.txt alone, the urls will remain in the index, however no crawling budget is wasted on them, But still, e.g having 100,000+ completely identical login pages within the omitted results, might not mean anything good. When I use meta noindex alone, I keep my index clean, but also keep Googlebot busy with indexing these no-value pages. When I use robots.txt and meta noindex together for existing content, then I suggest Google, that please ignore my content, but at the same time, I restrict him from crawling the noindex tag. Robots.txt and url removal together still not a good solution, as I have failed to remove directories this way. It seems, that only exact urls could be removed like this. I need a clear solution, which solves both issues (index and crawling). What I try to do now, is the following: I remove these directories (one at a time to test the theory) from the robots.txt file, and at the same time, I add the meta noindex tag to all these pages within the directory. The indexed pages should start decreasing (while useless page crawling increasing), and once the number of these indexed pages are low or none, then I would put the directory back to robots.txt and keep the noindex on all of the pages within this directory. Can this work the way I imagine, or do you have a better way of doing so? Thank you in advance for all your help.
Technical SEO | | Dilbak0 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
Robots.txt
Hello Everyone, The problem I'm having is not knowing where to have the robots.txt file on our server. We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes... Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file? Thanks for your insight on this matter.
Technical SEO | | BailHotline0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10 -
Title Tag for homepage Not displaying correctly in Google Search Results
Hi We have a problem with how Google is is displaying our title tag when a user does a search on our company name Increation. I have attached two images of how the title tag for our Homepage is dispayed in Google Search Results. One of them is "increations" - this is the result which is displaying incorrectly. The other shows Kitchens | Bathrooms | | Luxury Kitchens | - which is how we want the title tag to be dispalyed. This inorreclt result is only displayed when a user types in Increation into Google. (BTW it only happens on Google, not the other search engines). To give you some background an external SEO company did some work for us (before I joined as usual :-)). They built a lot of anchor links with the word "Increations" pointing back to our home page. Is this whats causing the probelm? Also how can we resolve this issue so that the correct Title Tag is displayed. Thanks for your help. Mik increation_search_results_google#hyhfZ
Technical SEO | | increation0