Need Help With Robots.txt on Magento eCommerce Site
-
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%.
Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website?
The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it.
I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site.
Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
-
You better back up your DB before doing that. Anyway, take a look at this MagentoConnect extension http://www.magentocommerce.com/magento-connect/MageWorx.com/extension/2852/seo-suite-enterprise#overview
or this one (it's by the same company
http://www.mageworx.com/seo-suite-pro-magento-extension.html
-
Thank you very much. We'll give that a shot and see how it goes. What started us tinkering with the robots file in the first place is that Bing Shopping told us it couldn't crawl our product images. Plus, our pdf files for product specs and manuals are all listed within the media folder. Do you have a suggestion for this? I would think we would get rid of "Disallow: /media/" and replace it with the following (what do you think?):
Disallow: /media/aitmanufacturers/
Disallow: /media/bigtom_media/
Disallow: /media/css/
Disallow: /media/downloadable/
Disallow: /media/easybanner/
Disallow: /media/geoip/
Disallow: /media/icons/
Disallow: /media/import/
Disallow: /media/js/
Disallow: /media/productsfeed/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/UPS/ -
Hello,
Below is what I use. You need to have the modrewrite enabled if you are going to disallow index.php and even then it's still very risky. This may be part of the issue. Robots.txt is so important, but you need to know what you are doing. Especially when disallowing as much as that UK site is.
Tyler
User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /images/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /customer/
Disallow: /enable-cookies/
Sitemap: http://domain.com/sitemap.xml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking in Robots.txt and the re-indexing - DA effects?
I have two good high level DA sites that target the US (.com) and UK (.co.uk). The .com ranks well but is dormant from a commercial aspect - the .co.uk is the commercial focus and gets great traffic. Issue is the .com ranks for brand in the UK - I want the .co.uk to rank for brand in the UK. I can't 301 the .com as it will be used again in the near future. I want to block the .com in Robots.txt with a view to un-block it again when I need it. I don't think the DA would be affected as the links stay and the sites live (just not indexed) so when I unblock it should be fine - HOWEVER - my query is things like organic CTR data that Google records and other factors won't contribute to its value. Has anyone ever blocked and un-blocked and whats the affects pls? All answers greatly received - cheers GB
Technical SEO | | Bush_JSM0 -
Need Advice on Categorizing Posts, Using Topics, Site Navigation & Structure
Hey there, My site had terrible categorization. I did a redesign, and essentially decided to start over using Topics instead of categories - which appear as my site's main navigation. Now I need to assign a Topic to all my posts. Is it safe to assign posts to multiple parent Topics from an SEO point of view? I want to do it since it would be helpful for users to find them in multiple locations some of the time, but I certainly don't want any SEO issues. Also, should I de-categorize all of my posts since I'm assigning them to my new hierarchical taxonomy - Topics? This is very important to finalize. Any help or advice is greatly appreciated. Thanks, Mike
Technical SEO | | naturalsociety0 -
No index tag robots.txt
Hi Mozzers, A client's website has a lot of internal directories defined as /node/*. I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages. However, the pages are already indexed and appear in the search results. In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page. Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages. Thanks!
Technical SEO | | WeAreDigital_BE
Jens0 -
Clarification regarding robots.txt protocol
Hi,
Technical SEO | | nlogix
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice.0 -
Magento CMS Block Issue --- Help Please
Good Morning, We have a Magento shopping cart based site running on RedHat version of Linux. We had a CMS block created for the homepage of http://goo.gl/JgK1e designed to be visible only on the homepage only and nowhere else. We copied the entire site structure onto a new URL http://goo.gl/XUH3f . (this one running on CentOS) and have an odd situation on our hands... Even though the CMS block “static_after_footer_block” is “enabled”, it either completely disappears (moments later), or whenever it does display, it is visible in ALL levels of the site (not just the homepage it was designed for) Other than this anomaly, the site seems to be operating correctly… Anyone out there with some insight? Thanks!
Technical SEO | | Prime850 -
Re-Platforming our ecommerce site. What am I missing?
Hello, We're going to be moving our niche ecommerce site with a catalog of over 4,000 products over to a new ecommerce platform (magento). All url structure will be changing although about 70% of the content will be staying the same such as meta info and product page content. We'll be doing 301 redirects of all old url's to new url's and we'll have a new google sitemap submitted immediatly. So my question is.. What MORE can I do to keep our site from dropping in the search engines while our site is being re-crawled? Does anyone have any experience regarding what normally happens during a website replatform such as this? Thanks in advance for your help!
Technical SEO | | DannyQR0 -
Is the Sandbox Real? Need Help!
To start, I'm very new at this so I've likely made a ton of mistakes but here is the breakdown of what's happened/what's been done to my site. I own a wedding photography company which was based in Portland, we decided about six months prior that we wanted to relocate to San Diego. It was too soon to optimize our website for our new town of San Diego so I created a brand new site. It was born around June 2011. It looks just like the old site but all the content is different (different titles, re-uploaded images, text, etc was optimized for San Diego). What may be my pitfall is I imported our blog posts from the old site to the new site and we continued to keep both blogs live (writing the post in one, importing to the other). San Diego site: http://continuumweddings.com Old Site (now optimized for LA): http://continuumphotography.com From there I began link building. I signed up for the SEO Scheduler and began making the changes suggested there. It told me to sign up for Linxboss, and I did it. Other than that, my links have been build naturally and I have quite a few of them, definitely enough to compete with my top competitors. At one point I was #3 for "San Diego Wedding Photographer" and I stayed there for a couple weeks. Then I began to drop. Now I'm somewhere on page 10. I've read a lot of articles on here and I know I have a lot of things potentially hurting me. Site age, Duplicate content, etc. I'm just not sure why I dropped (still rank on 1st page in Yahoo & Bing) and what I should do about it. I tend to get overwhelmed and every post I read seems to talk about something new I may have done wrong. I'm willing to put in the time to fix this; I just need to know where my time is best spent.
Technical SEO | | mrsmelmitch0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230