Help with robots.txt on Magento
-
Hi everybody,
I need your help in order to fix some problems with HTML errors and Crawling errors generated by Magento on my client's website www.casabiancheria.it
I have some problems with duplicate meta informations due to the fact that there are a lot of links such as
-
/stampe-romagnole/tovaglie-con-tovaglioli**/colore/**beige,marrone,giallo,lilla/show/all.html
-
/stampe-romagnole/tovaglie-con-tovaglioli**/colore/**beige,marrone,lilla/show/all.html
that are generated by the filter /colore/ and so they have duplicate content and meta information on them.
I activated the canonicals on Magento but this hasn't fixed the problem yet.
On the sitemap there are only 1 link for each product, so it seems that the canonicals are working, but bot Google Webmaster Tools and SEO Moz are giving me errors on duplicate content and meta informations.
I would like to solve these problems by excluding from robots.txt all the urls that contain the filter parameters, such as /colore/, /price/, /dimensions/, etc. (take a look to the attachment).
I tried different solutions in order to exclude these links from robots, but I wasn't able to succeed.
Below you can find my current robots.txt... can someone help me in order to write the correct form of this file and finally exclude all these urls generated by filters on Magento?
Finally, is it worth it to exclude also the images from Magento? (take a look to the final lines of the robots below).
Thank you very much for your help!
Alberto
User-agent: *
Disallow: /CVS
Disallow: /.svn$
Disallow: /.idea$
Disallow: /.sql$
Disallow: /.tgz$
Disallow: /w1nL1f3L0g1c/
Disallow: /app/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /shell/
Disallow: /var/
Disallow: /404/
Disallow: /cgi-bin/
Disallow: /magento/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /api.php
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /get.php
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /README.txt
Disallow: /RELEASE_NOTES.txt
Disallow: /?dir
Disallow: /?dir=desc
Disallow: /?dir=asc
Disallow: /?limit=all
Disallow: /?mode*
Disallow: /index.php/
Disallow: /?SID=
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/
Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /cgi-bin/
Disallow: /cleanup.php
Disallow: /apc.php
Disallow: /memcache.php
Disallow: /phpinfo.php
Disallow: /control/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/
Disallow: /?*
Disallow: //colore/
Disallow: //price/
Disallow: //misura/
Disallow: //marca/
Disallow: //sort-by/
Disallow: //combinazione/
Disallow: /*/seleziona-colore/
Disallow: /colore/
Disallow: /price/
Disallow: /misura/
Disallow: /marca/
Disallow: /sort-by/
Disallow: /combinazione/
Disallow: /seleziona-colore/
Disallow: /*colore/
Disallow: /*price/
Disallow: /*misura/
Disallow: /*marca/
Disallow: /*sort-by/
Disallow: /*combinazione/
Disallow: /*seleziona-colore/ -
-
Hi,
If the duplicated content urls are already in the google index then excluding them with the robots.txt will not remove them but just stop the google bot from crawling them again. You could do a bit of conditional logic on your head.phtml template file to check for the relevant url part and output a noindex,follow meta tag on the pages you don't want indexed. This is a more reliable way to make sure they are removed and not indexed in the future (be sure to test first!).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help Me With These Horrible Crawl Errors
I am currently using Opencart to run a fairly popular ecommerce website. I have done quite well in terms of rankings so far. However, the Opencart platform only goes so far. There are endless amounts of crawl errors, from duplicate content down to missing meta tags. There is no easy way to add canonical tags. The pages that are being listed in the crawl errors seem to be mainly search URL's that i have no idea how Moz found them. An example of the page is: .....com/index.php?route=product/search&tag=home+office+flooring Which has ten duplicate pages all missing meta tags. I have used Google webmaster tools and htaccess to deny access to these pages with the slug "route=product/search" but it doesn't seem to work. So my question to you guys is as follows: Is it worth trying to fix these errors and do they have an effect on SEO? If so, how do I prevent these errors that seem to grow at every crawl? Cheers, Danny
Search Behavior | | DannyHoodless0 -
Does anyone know of a predictive demographics software that helps a website predict its audience based on cookies, or whatever info it has?
Hi guys, I'm looking for a predictive analytics software for better understanding our audience online. Has anyone heard of or used a software that takes the visitors currently coming to your site and uses that data to 'predict' more information about them? Such as age, location, purchasing power, etc? Please let me know if you have!
Search Behavior | | Raconteur
Thanks,
Sabilah0 -
Disallow robots on a url effect?
Hello, I am using wordpress and on some of my top tag urls e.g : http://www.designzzz.com/tag/brushes/ there is an avg page rank of 4. I was reading somewhere that my huge jump in not selected count in index status in GWT is becuase of tags/categories urls. So i added disallow in my robots.txt for the url /tag What sort of effect will it have on my tag urls rankings or PR ?
Search Behavior | | wickedsunny10 -
I need Help with Google!!!!
I am trying to have my picture on the first page just like SEOmoz when someone search just the name, I know have something to do with google plus, but I am so new doing that no luck or probably I am doing wrong, I have been looking in the internet but I haven't found anything. Is there anyone who can write a tutorial and post here. Maybe is already done and I don't know where. Please see the picture attached so you understand better what I want to do Again I want appear just like this but with my company http://www.sombras.co.uk/images/pic.jpg pic.jpg
Search Behavior | | teksyte0 -
How long til meta robots noindex takes effect?
I have a wordpress site with about 3,000 posts and over 1,000 tags. All of the tag archives are currently indexed in Google and I don't want them to be. I just set the meta robots to no-index all the tag archives and was wondering how long it will take til they're out of the search engines? Since there are close to 1,500 of these and they are duplicate content it would be nice to have them gone asap. I noticed Webmaster Tools allows me to resubmit my site to index if my site has changed significantly... should I try that?? Any other advice would be greatly appreciated!
Search Behavior | | gfreeman230 -
HELP! 75% Drop in Traffic with no Explaination
Hello Everyone, After looking at one of my websites, http://pokeronamac.com/ I noticed that there is a significant drop in all traffic to my site. I cannot figure out what the exact source is, as we received no warnings in WMT, and as far as I know there are no major updates released in the last week or so by Google. I took a look at the content section of GA and I saw that virtually all of my content was seeing a decrease in traffic, meaning that it isn't specific to a few pages, which leads me to believe that it is an algorithmic penalty. My confusion was that there was no changes made to the website in the past few months, although, on September 10th (5 days after traffic dropped) I updated the meta descriptions and titles site wide. I don't believe that the penalty is from Panda or Penguin because we actually experienced a positive influx in traffic during the week of April 25th. I'd appreciate any insights into helping me recover from this so I can get the visits again that I had before. If any other metrics are required, i'd gladly provide them. Thanks Zach Russell 8NBwn.png
Search Behavior | | Zachary_Russell0 -
Google Penalisation - Any help would be appreciated!
Hi,
Search Behavior | | ChrisHolgate
We’ve recently received a Google notification of unnatural linking along with a confirmation that we're being penalised. There were a few other sites that we owned that perhaps had too many links pointing to our main domain so we trimmed them down and submitted a reconsideration request and got the following back: "Dear site owner or webmaster of http://www.refreshcartridges.co.uk/,
We received a request from a site owner to reconsider http://www.refreshcartridges.co.uk/ for compliance with Google's Webmaster Guidelines.
We've reviewed your site and we still see links to your site that violate our quality guidelines.
Specifically, look for possibly artificial or unnatural links pointing to your site that could be intended to manipulate PageRank. Examples of unnatural linking could include buying links to pass PageRank or participating in link schemes.
We encourage you to make changes to comply with our quality guidelines. Once you've made these changes, please submit your site for reconsideration in Google's search results.
If you find unnatural links to your site that you are unable to control or remove, please provide the details in your reconsideration request.
If you have additional questions about how to resolve this issue, please see our Webmaster Help Forum for support.
Sincerely,
Google Search Quality Team" I want to stress that we have never in the past and do not currently buy any backlinks. The problem that we face now is that our site has been online for best part of a decade, there are thousands of people linking to us and I have absolutely no idea where to start. We don’t use an SEO Company but in the past few months have been using SEOmoz to improve our on-page optimisation. I know it’s a massive ask but if could a member of the SEOmoz community or a staff member quickly take a gander and let us know if anything in particular sticks out like a sore thumb it would mean a great deal to me. Of course, if needed we’ll employ the services of an SEO company but I’m hoping one of you guys will see something immediately obvious that could really help us out! Thanks in advance. Kind regards Chris0