Have I constructed my robots.txt file correctly for sitemap autodiscovery?
-
Hi,
Here is my sitemap:
User-agent: *
Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml
Directories
Disallow: /sendfriend/
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.htmlI'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery?
thanks,
-
Hey thanks for the response. There are about 14,000 url's in the sitemap. It shouldn't freeze up - please would you try again.
http://www.bedsite.co.uk/sitemaps/sitemap.xml
I know what you mean about the allow all
-
Also, here is the best place to answer your questions.
From Google: "The Test robots.txt tool will show you if your robots.txt file is accidentally blocking Googlebot from a file or directory on your site, or if it's permitting Googlebot to crawl files that should not appear on the web. " You can find it here
-
The robots.txt looks fine. I always add an allow all, even knowing it is not necessary but it makes me feel better lol.
The problem you have is with the sitemap itself. How big is it? I cannot tell how many links you have because it locks up every time I go to it in both chrome and firefox.
I tried to send a tool that is designed to pull sitemaps as the SERPS do and it also freezes up.
How many links do you have?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
How to manually create sitemap with 301 redirrctions?
Hi there I've just redirected a few of my pages (created in Dreamweaver) using the PHP Redirect function, I am wondering how do I create a manual sitemap to submit to Google to reflect this or will Google pick it up automatically based on the below? Example:
Technical SEO | | IsaCleanse
URL Being redirected: http://industrytix.com.au/cuban-club-perth-tickets.php **New URL: **[http://industrytix.com.au/buy-tickets/cuban-club-perth-nyd/ Redirect code:](http://industrytix.com.au/buy-tickets/cuban-club-perth-nyd/) Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://industrytix.com.au/buy-tickets/cuban-club-perth-nyd/" );
?>0 -
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
Sitemap Contains Blocked Resources
Hey Mozzers, I have several pages on my website that are for user search purposes only. They sort some products by range and answer some direct search queries users type into the site. They are basically just product collections that are else ware grouped in different ways. As such I didn't wants SERPS getting their hands on them so blocked them in robots so I could add then worry free. However, they automatically get pulled into the sitemap by Magento. This has made Webmaster tools give me a warning that 21 urls in the sitemaps are blocked by robots. Is this terrible SEO wise? Should I have opted to NOINDEX these URLS instead? I was concerned about thin content so really didnt want google crawling them.
Technical SEO | | ATP0 -
Not All Submitted URLs in Sitemap Get Indexed
Hey Guys, I just recognized, that of about 20% of my submitted URL's within the sitemap don't get indexed, at least when I check in the webmaster tools. There is of about 20% difference between the submitted and indexed URLs. However, as far as I can see I don't get within webmaster tools the information, which specific URLs are not indexed from the sitemap, right? Therefore I checked every single page in the sitemap manually by putting site:"URL" into google and every single page of the sitemap shows up. So in reality every page should be indexed, but why does webmaster tools shows something different? Thanks for your help on this 😉 Cheers
Technical SEO | | _Heiko_0 -
Some URLs in the sitemap not indexed
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index. It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible. Any ideas on why this is? Is this type of discrepancy typical?
Technical SEO | | Mase0 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0