Robots file set up
-
The robots file looks like it has been set up in a very messy way.
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please
#Sitemap: https://examplesite.com/sitemap.xml
Crawlers Setup
User-agent: *
Crawl-delay: 10Allowable Index
Mind that Allow is not an official standard
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Allow: /catalogsearch/result/
Allow: /media/catalog/
Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/Disallow: /media/
Disallow: /media/captcha/
Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+Paths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSIDDisallow: /:
Disallow: /User-agent: Fatbot
Disallow: /User-agent: TwengaBot-2.0
Disallow: / -
To add to this, I'd also recommend having a look around in /lib/ just to make sure you aren't blocking important javascript and css files (I've been bitten by this!).
More guidance here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
-
Looks like your intuitions are pretty good! I would remove the # before sitemap, as you have indicated. I would remove the line about /js/ as Google needs access to javascript these days and will throw a fit if you don't. I wouldnt worry about the wysiwyg directory if it only has images that you dont care about ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No: 'noindex' detected in 'robots' meta tag
I'm getting an error in Search Console that pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. Unfortunately I can't post images on here but I've linked some url's below. The page below in search console shows the error above... https://mixeddigitaleduconsulting.com/ As does this one. https://mixeddigitaleduconsulting.com/independent-school-marketing-communications/ However, this page does not have the error and is indexed by Google. The meta robots tag looks identical. https://mixeddigitaleduconsulting.com/blog/leadership-team/jill-goodman/ Any and all help is appreciated.
Technical SEO | | Sean_White_Consult0 -
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
Should summary pages have the rel canonical set to the full article?
My site has tons of summary pages, Whether for a PDF download, a landing page or for an article. There is a summary page, that explains the asset and contains a link to the actual asset. My question is that if the summary page is just summary of an article with a "click here to read full article" button, Should I set the rel canonical on the summary page to go to the full article? Thanks,
Technical SEO | | Autoboof0 -
Where Is This Being Addended to Our Page File Names?
I have worked over the last several months to eliminate duplicate page titles at our site. Below is one situation that I need your advice on. Google Webmaster Tools is reporting several of our pages with
Technical SEO | | lbohen
duplicate title such as this one: This is a valid page at our Web store: http://www.audiobooksonline.com/159179126X.html This is an invalid page that Google says is a duplicate of the one above: http://www.audiobooksonline.com/159179126X.html?gdftrk=gdfV2138_a_7c177_a_7c432_a_7c9781591791263 Where might the code ?gdftrk=.... be coming from? How to get rid of it?0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Removal request for entire catalog. Can be done without blocking in robots?
Bunch of thin content (catalog) pages modified with "follow, noindex" few weeks ago. Site completely re-crawled and related cache shows that these pages were not indexed again. So it's good I suppose 🙂 But all of them are still in main Google index and shows up from time to time in SERPs. Will they eventually disappear or we need to submit removal request?Problem is we really don't want to add this pages into robots.txt (they are passing link juice down below to product pages)Thanks!
Technical SEO | | LocalLocal0 -
How many times robots.txt gets visited by crawlers, especially Google?
Hi, Do you know if there's any way to track how often robots.txt file has been crawled? I know we can check when is the latest downloaded from webmaster tool, but I actually want to know if they download every time crawlers visit any page on the site (e.g. hundreds of thousands of times every day), or less. thanks...
Technical SEO | | linklater0 -
Yoast WordPress SEO settings please help
Hello 🙂 Can you please look at these screenshots of my Yoast WordPress SEO settings http://www.zaslike.com/files/h5149mi5435dspiswfm.jpg http://www.zaslike.com/files/5dlhmjxfh2j0hqswesha.jpg http://www.zaslike.com/files/fmx1pwih240gwiofh86s.jpg http://www.zaslike.com/files/w7tyvlhgr5vhv149b9a.png http://www.zaslike.com/files/l9lo37jfpeqmrpufke8.png are they good ? Do i need to change something or correct ? please help Thank you !!!! :))))
Technical SEO | | wolfinjo0