Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
-
Hi Everyone,
I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file?
robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
For syntax checking, see:
http://www.sxw.org.uk/computing/robots/check.html
Website Sitemap
Sitemap: http://www.bestpricenutrition.com/sitemap.xml
Crawlers Setup
User-agent: *
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=25 -
Thanks again for the response. Looks like it just took a little more time for Google to resolve the issue. No more errors. Didn't do anything but resubmit Sitemap and Robots.txt.
Thanks for the tips as well. I am going to post one more question in another thread.
-
Jeff,
I was only able to find only ONE URL in the sitemap that is blocked by the robots.txt that you've posted in this question.
Check the image attached.
The URL is: https://www.bestpricenutrition.com/catalog/product/view/id/15650.htmlWhat did I do? A manual search of all the disallowed terms in the sitemap.
Also, you might want to take a comprehensive read at this article about robots.txt. It helped me to find that mistake.
The complete guide to Robots.txt - Portent.comBest Luck.
GR. -
Thanks for the quick response.
-
Yes...Google Webmaster Tools is giving examples...and they are basically all the product pages.
-
Did the Add Site under Google Webmaster Tools yes...this is from that new 'account'.
-
Yes...we are fixing that.
You see anything in that robots.text above that would indicate we are blocking https product pages?
-
-
Hello Jeff,
Just some routine questions to establish a base line:
- Have you checked that the sitemap doesnt include any of the disallowed URLs?
- You said that there was a movement to HTTPS, have you created a new account for the new domain?
- Im seing that the robots.txt has the old URL for the sitemap, without the HTTPS correction.
Let me know.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap url's not being indexed
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed) The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it. For example Url in the sitemap: http://example.com/example-category/0246 Url once you actually go to that link: http://example.com/example-category/0246#.VR5a Just for further information, the XML file does not have any style information associated with it and is in it's most basic form. Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ? Thanks all for your help.
Technical SEO | | GreenStone0 -
Having Problems to Index all URLs on Sitemap
Hi all again ! Thanks in advance ! My client's site is having problems to index all its pages. I even bought the full extension of XML Sitemaps and the number of urls increased, but we still have problems to index all of them. What are the reasons? The robots.txt is open for all robots, we only prohibit users and spiders to enter our Intranet. I've read that duplicate content and 404's can be the reason. Anything else?
Technical SEO | | Tintanus0 -
Responsive web design has a crawl error of redirecting to HTTP instead of HTTPS ? is this because of the new update of google that appreciates the HTTPs more?
We at yamsafer.me are using a Repsonsive web design! A crawl errors occured which redirects the hompage to an HTTP version instead of HTTPS? Any ideas on why this happened?
Technical SEO | | Yamsafer.com0 -
Robots.txt on refinements
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
Technical SEO | | Gordian0 -
Redirecting http to https, do I need to add new url to webaster tools?
Hey all, we just 301 redirected all our http url's to https. I'm getting some funky data in webmaster tools, such as a drastic change in pages indexed and pages submitted over pages indexed. Might be a dumb question, but do I need to update my website in webmaster tools with the new https address, or should I be getting credible data from the old http url that is already in there? Thank you in advance!
Technical SEO | | jaychow0 -
Webmaster tools...URL Errors
Hi mozzers, Quick question. Whats the best thing to do about URL errors in webmaster tools. They are all 404s that point from external sites. Many of them are junk spam sites. Should I mark them as "fixed" or just leave them. I'm hoping google is aware it's out of my control if spam sites want to link to 404s on my site. Peter
Technical SEO | | PeterM220 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220 -
Why google index my IP URL
hi guys, a question please. if site:112.65.247.14 , you can see google index our website IP address, this could duplicate with our darwinmarketing.com content pages. i am not quite sure why google index my IP pages while index domain pages, i understand this could because of backlink, internal link and etc, but i don't see obvious issues there, also i have submit request to google team to remove ip address index, but seems no luck. Please do you have any other suggestion on this? i was trying to do change of address setting in Google Webmaster Tools, but didn't allow as it said "Restricted to root level domains only", any ideas? Thank you! boson
Technical SEO | | DarwinChinaSEO0