Robots.txt and Magento
-
HI,
I am working on getting my robots.txt up and running and I'm having lots of problems with the robots.txt my developers generated. www.plasticplace.com/robots.txt
I ran the robots.txt through a syntax checking tool (http://www.sxw.org.uk/computing/robots/check.html) This is what the tool came back with: http://www.dcs.ed.ac.uk/cgi/sxw/parserobots.pl?site=plasticplace.com There seems to be many errors on the file.
Additionally, I looked at our robots.txt in the WMT and they said the crawl was postponed because the robots.txt is inaccessible. What does that mean?
A few questions:
1. Is there a need for all the lines of code that have the “#” before it? I don’t think it’s necessary but correct me if I'm wrong.
2. Furthermore, why are we blocking so many things on our website? The robots can’t get past anything that requires a password to access anyhow but again correct me if I'm wrong.
3. Is there a reason Why can't it just look like this:
User-agent: *
Disallow: /onepagecheckout/
Disallow: /checkout/cart/
I do understand that Magento has certain folders that you don't want crawled, but is this necessary and why are there so many errors?
-
Yes your short robots.txt idea would create a huge problem.
In your Magento admin if you click in the menu Catalog > URL Rewrite Management
You will see the magento feature that creates all the "pretty urls", in that page you will see a table. If get value from Target path column and copy and paste after your site domain, for example domain.com/value_in_target_path...
You'll see that the page loads fine, you don't want Google to rank those pages with the "messy" URL so that's why you need all those stuff in your robots.txt
-
I am bit confused. Are you saying that technically my Magento site has two different urls that can both be indexed; one with a (messy) url and another with a vanity url? This would create major duplicate content issues! The robots.txt would not solve such a complex issue.
Am I missing something?
-
My developer said they custom configured it to block the files they needed according to Magento.
You think I can simply make it look like this:
User-agent: *
Disallow: /onepagecheckout/
Disallow: /checkout/cart/
and then disable it in WMT?
-
3. Is there a reason Why can't it just look like this:
Yes, It would generate a lot of duplicates issues, for example your robots.txt you have the follow line:
Disallow: /catalog/category/view/ -> That's the "real" category URL, you can access any category on magento by /catalog/category/view/id or by the "pretty" URL. Because you disallow the "real: URL only the pretty URL will be viable for search engines. This same rule apply for many other parts of the robots.txt.
-
I assume this is a robots.txt that has been automatically created by Magento? - or has it been created by a developer?
I ran it through a tool and it showed 1 error and 10 warnings - so i would say you definitely need to do something about it.
The reason for all those disallows is to try and stop search engine indexing them (whether they would even find them to index them if they were not there is debatable).
What you could do is set up robots.txt as you have suggested and then stop the SE's indexing the directories or pages you don't want in appropriate webmaster tools.
I don't like displaying a lot of 'don't index' paths in the robots texts as it is pretty much telling any hacker or nasty spider where your weak points may be.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Hello I have a Magento Store View issue!
Hi guys hope you can help - we have recenetly added a couple of "store views" to our magento website, in order for us to break down orders from channels that feed through M2e pro. What is great is that it gives us a little break down on amazon, ebay and the store sales separate, depending on which we select. However - we have ran into a issue with duplicate content, mainly on pages such as the checkout? /checkout/?___store=default&___from_store=m2e_ebay /checkout/?___store=default&___from_store=m2e_amazon Issue being the above - I am hoping we can keep how we have broken down the store for a quick reporting instance by day. However how do / if I can - get round this?? Surely if I was to redirect on the checkout it would cause issues. Any advice greatly appreciated Kelly
Technical SEO | | KellyLloyd6660 -
Will it be possible to point diff sitemap to same robots.txt file.
Will it be possible to point diff sitemap to same robots.txt file.
Technical SEO | | nlogix
Please advice.0 -
Magento CMS Block Issue --- Help Please
Good Morning, We have a Magento shopping cart based site running on RedHat version of Linux. We had a CMS block created for the homepage of http://goo.gl/JgK1e designed to be visible only on the homepage only and nowhere else. We copied the entire site structure onto a new URL http://goo.gl/XUH3f . (this one running on CentOS) and have an odd situation on our hands... Even though the CMS block “static_after_footer_block” is “enabled”, it either completely disappears (moments later), or whenever it does display, it is visible in ALL levels of the site (not just the homepage it was designed for) Other than this anomaly, the site seems to be operating correctly… Anyone out there with some insight? Thanks!
Technical SEO | | Prime850 -
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?
Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing0 -
Mobile site: robots.txt best practices
If there are canonical tags pointing to the web version of each mobile page, what should a robots.txt file for a mobile site have?
Technical SEO | | bonnierSEO0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220 -
Robots.txt Syntax
Does the order of the robots.txt syntax matter in SEO? For example (are there potential problems with this format): User-agent: * Sitemap: Disallow: /form.htm Allow: / Disallow: /cgnet_directory
Technical SEO | | RodrigoStockebrand0