Robots file set up
-
The robots file looks like it has been set up in a very messy way.
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please
#Sitemap: https://examplesite.com/sitemap.xml
Crawlers Setup
User-agent: *
Crawl-delay: 10Allowable Index
Mind that Allow is not an official standard
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Allow: /catalogsearch/result/
Allow: /media/catalog/
Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/Disallow: /media/
Disallow: /media/captcha/
Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+Paths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSIDDisallow: /:
Disallow: /User-agent: Fatbot
Disallow: /User-agent: TwengaBot-2.0
Disallow: / -
To add to this, I'd also recommend having a look around in /lib/ just to make sure you aren't blocking important javascript and css files (I've been bitten by this!).
More guidance here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
-
Looks like your intuitions are pretty good! I would remove the # before sitemap, as you have indicated. I would remove the line about /js/ as Google needs access to javascript these days and will throw a fit if you don't. I wouldnt worry about the wysiwyg directory if it only has images that you dont care about ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should we set up redirects for all deleted TAGS?
We recently found our site had 65,000 tags (yes 65K). In an effort to consolidate these we've started deleting them. MOZ is now reporting a heap of 404 errors for tag pages. These tag pages should not have links to them so not sure how come they're being crawled. Any suggestions from experience in this area would be useful.
Technical SEO | | wearehappymedia0 -
Technical guide for Setting up a CDN to host our images, as well as creating an image sitemap, and setting up the CDN in GWT?
Hi All! We're thinking of setting up a CDN to host our images with a CNAME on a subdomain of our site. In terms of SEO, I was wondering if any of you knew of a pretty complete technical guide for setting it all up. Including whether or not we need to create an image sitemap, and setting it up in GWT. Thanks in advance! Vince
Technical SEO | | jbrisebois0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Google is indexing blocked content in robots.txt
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
Technical SEO | | elisainteractive0 -
RegEx help needed for robots.txt potential conflict
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both: Allow: /*?p= and Disallow: /?p=& I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number? I've looked at several resources and there is practically no reference to what "&" does... Can anyone shed any light on this, to ensure I am allowing suitable access to a shop? Thanks in advance for any assistance
Technical SEO | | MSTJames0 -
Yoast plug in - title settings
Hi, I am using yoast plugin and having problems with title. For example, my recent post http://www.soobumimphotography.com/bulverde-realtor-headshot/
Technical SEO | | BistosAmerica
It's showing as "Bulverde Realtor Headshot | San Antonio Headshot PhotographerSan Antonio Wedding Photography Journal So basically, homepapge title is followed on every single page and post. I would like Bulverde Realtor Headshot | San Antonio Headshot Photographer Could you help with this?0 -
File from godaddy.com
Hi, One of our client has received a file from godaddy.com where his site is hosted. Here is the message from the client- "i submitted my site for Search Engine Visibility,but they got some issue on the site need to be fixed. i tried myself could not fix it" The site in question is - http://allkindofessays.com/ Is there any problem with the site ? Contents of the file - bplist00Ó k 0_ WebSubframeArchives_ WebSubresources_ WebMainResource L x Ï Ö Ý ] ¨ ¯ ¼ Û 6 SÓ @ F¡ Ó / :¡ Ó )¡ Ò ¡ Ô _ WebResourceResponse_ WebResourceData_ WebResourceMIMEType^WebResourceURLO cbplist00Ô Z[X$versionX$objectsY$archiverT$top † ¯ "()0 12DEFGHIJKLMNOPTUU$nullÝ !R$6S$10R$2R$7R$3S$11R$8V$classR$4R$9R$0R$5R$1€ € € € € € € € Ó #$%& [NS.relativeWNS.base€ € € _ ¢http://tags.bluekai.com/site/2748?redir=http%3A%2F%2Fsegment-pixel.invitemedia.com%2Fset_partner_uid%3FpartnerID%3D84%26partnerUID%3D%24_BK_UUID%26sscs_active%3D1Ò*+,-Z$classnameX$classesUNSURL¢./UNSURLXNSObject#A´ þ¹ –5 ÈÓ 3456=WNS.keysZNS.objects€ ¦789:;<€ €€ € €€ ¦>?@ABC€ € € € € € \Content-TypeSP3PVServerTDate^Content-LengthYBK-ServerYimage/gif_ nCP="NOI DSP COR CUR ADMo DEVo PSAo PSDo OUR SAMo BUS UNI NAV", policyref="http://tags.bluekai.com/w3c/p3p.xml"_ Apache/2.2.3 (CentOS)_ Sat, 10 Sep 2011 20:23:21 GMTR62T87dfÒ*+QR_ NSMutableDictionary£QS/\NSDictionary >Ò*+VW_ NSHTTPURLResponse£XY/_ NSHTTPURLResponse]NSURLResponse_ NSKeyedArchiverÑ]_ WebResourceResponse€ # - 2 7 R X s v z } € ƒ ‡ Š ‘ ” — š ¢ ¤ ¦ ¨ ª ¬ ® ° ² ´ ¶ ¸ ¿ Ë Ó Õ × Ù ~ ƒ Ž — ¦ ¯ ¸ º Á É Ô Ö Ý ß á ã å ç é ð ò ô ö ø ú ü ( 2 < Å å è í ò 4 8 L Z l o … ^ ‡O >GIF89a ÿÿÿ!ÿ NETSCAPE2.0 !ù , L ;Yimage/gif_ ¢http://tags.bluekai.com/site/2748?redir=http%3A%2F%2Fsegment-pixel.invitemedia.com%2Fset_partner_uid%3FpartnerID%3D84%26partnerUID%3D%24_BK_UUID%26sscs_active%3D1Õ _ WebResourceTextEncodingName_ WebResourceFrameNameO 6
Technical SEO | | seoug_20050 -
How do I set up a site review for a password protected site?
We need to conduct a SEO analysis for a website that is on a private, password protected development site -- is there anyway for SEOMoz tools to access and analyze a PW protected site? Thank you, Sara Merten
Technical SEO | | kev110