Why does SEOMoz crawler ignore robots.txt?
-
The SEOMoz crawler ignores robots.txt
It also "indexes" pages marked as noindex.
That means it is filling up the reports with things that don't matter.
Is there any way to stop it doing that?
-
Hi Alan,
The code should be ok
Try to "drive-test" it with a custom crawl from http://pro.seomoz.org/tools/crawl-test then you will see if it works well.
I am glad the link was useful.
Gr.,
Istvan
-
Thank you István
I added this:
User-agent: rogerbot
Disallow: /sendtoafriend/
Disallow: /photo/
Disallow: /pix/- because crawlers shouldn't go down those paths and roger is detecting pages without descriptions.
Is what I added OK?
-
Hi,
You can block RogerBot from Robots.txt
Check for further instructions on: http://www.seomoz.org/dp/rogerbot
"Please note: Adding this code will prevent our crawl test tool from being able to crawl your website."
Gr.,
Istvan
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocking Moz
Moz are reporting the robots.txt file is blocking them from crawling one of our websites. But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems. We have never come up against this before, even with this site. Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error. Can anyone enlighten us to the problem please? http://www.wychwoodflooring.com -Christina
Moz Pro | | ChristinaRadisic0 -
Htaccess and robots.txt and 902 error
Hi this is my first question in here I truly hope someone will be able to help. It's quite a detailed problem and I'd love to be able to fix it through your kind help. It regards htaccess files and robot.txt files and 902 errors. In October I created a WordPress website from what was previously a non-WordPress site it was quite dated. I had built the new site on a sub-domain I created on the existing site so that the live site could remain live whilst I created on the subdomain. The site I built on the subdomain is now live but I am concerned about the existence of the old htaccess files and robots txt files and wonder if I should just delete the old ones to leave the just the new on the new site. I created new htaccess and robots.txt files on the new site and have left the old htaccess files there. Just to mention that all the old content files are still sat on the server under a folder called 'old files' so I am assuming that these aren't affecting matters. I access the htaccess and robots.txt files by clicking on 'public html' via ftp I did a Moz crawl and was astonished to 902 network error saying that it wasn't possible to crawl the site, but then I was alerted by Moz later on to say that the report was ready..I see 641 crawl errors ( 449 medium priority | 192 high priority | Zero low priority ). Please see attached image. Each of the errors seems to have status code 200; this seems to be applying to mainly the images on each of the pages: eg domain.com/imagename . The new website is built around the 907 Theme which has some page sections on the home page, and parallax sections on the home page and throughout the site. To my knowledge the content and the images on the pages are not duplicated because I have made each page as unique and original as possible. The report says 190 pages have been duplicated so I have no clue how this can be or how to approach fixing this. Since October when the new site was launched, approx 50% of incoming traffic has dropped off at the home page and that is still the case, but the site still continues to get new traffic according to Google Analytics statistics. However Bing Yahoo and Google show a low level of Indexing and exposure which may be indicative of the search engines having difficulty crawling the site. In Google Analytics in Webmaster Tools, the screen text reports no crawl errors. W3TC is a WordPress caching plugin which I installed just a few days ago to speed up page speed, so I am not querying anything here about W3TC unless someone spots that this might be a problem, but like I said there have been problems re traffic dropping off when visitors arrive on the home page. The Yoast SEO plugin is being used. I have included information about the htaccess and robots.txt files below. The pages on the subdomain are pointing to the live domain as has been explained to me by the person who did the site migration. I'd like the site to be free from pages and files that shouldn't be there and I feel that the site needs a clean up as well as knowing if the robots.txt and htaccess files that are included in the old site should actually be there or if they should be deleted... ok here goes with the information in the files. Site 1) refers to the current website. Site 2) refers to the subdomain. Site 3 refers to the folder that contains all the old files from the old non-WordPress file structure. **************** 1) htaccess on the current site: ********************* BEGIN W3TC Browser Cache <ifmodule mod_deflate.c=""><ifmodule mod_headers.c="">Header append Vary User-Agent env=!dont-vary</ifmodule>
Moz Pro | | SEOguy1
<ifmodule mod_filter.c="">AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
<ifmodule mod_mime.c=""># DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml</ifmodule></ifmodule></ifmodule> END W3TC Browser Cache BEGIN W3TC CDN <filesmatch ".(ttf|ttc|otf|eot|woff|font.css)$"=""><ifmodule mod_headers.c="">Header set Access-Control-Allow-Origin "*"</ifmodule></filesmatch> END W3TC CDN BEGIN W3TC Page Cache core <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=W3TC_ENC:_gzip]
RewriteCond %{HTTP_COOKIE} w3tc_preview [NC]
RewriteRule .* - [E=W3TC_PREVIEW:_preview]
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} =""
RewriteCond %{REQUEST_URI} /$
RewriteCond %{HTTP_COOKIE} !(comment_author|wp-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f
RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L]</ifmodule> END W3TC Page Cache core BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress ....(((I have 7 301 redirects in place for old page url's to link to new page url's))).... #Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.domain.co.uk [NC]
RewriteRule ^(.*)$ http://domain.co.uk/$1 [L,R=301] **************** 1) robots.txt on the current site: ********************* User-agent: *
Disallow:
Sitemap: http://domain.co.uk/sitemap_index.xml **************** 2) htaccess in the subdomain folder: ********************* Switch rewrite engine off in case this was installed under HostPay. RewriteEngine Off SetEnv DEFAULT_PHP_VERSION 53 DirectoryIndex index.cgi index.php BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /WPnewsiteDee/
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /subdomain/index.php [L]</ifmodule> END WordPress **************** 2) robots.txt in the subdomain folder: ********************* this robots.txt file is empty **************** 3) htaccess in the Old Site folder: ********************* Deny from all *************** 3) robots.txt in the Old Site folder: ********************* User-agent: *
Disallow: / I have tried to be thorough so please excuse the length of my message here. I really hope one of you great people in the Moz community can help me with a solution. I have SEO knowledge I love SEO but I have not come across this before and I really don't know where to start with this one. Best Regards to you all and thank you for reading this. moz-site-crawl-report-image_zpsirfaelgm.jpg0 -
SEOMoz Q&A having some issues?
Apologies if this has been asked. I was browsing Q&A and got a Ruby error page, reloaded and everything was fine. However a bunch of the topics are kind of messed up, and it looks like a chunk of them are missing now. For example my 'questions I've answered' section of My Q&A only has the newest question and the oldest question, and when I sort the SEOMoz Tools category by newest I get one recent post and then the next one is from November of last year. You guys having database problems?
Moz Pro | | icecarats0 -
Google Inbound Links much higher than SEOMOZ
I just started here and really like everything the community and tools so far. One question. My SEOMOZ dashboard shows only one inbound link while Google Web Tools shows 300+. My site has been up for 3 months now and I'm curious about the lack of indexing. Any help, tips or feedback is greatly appreciated. http://CrossTrainingandFitness.com Robb Jarrett
Moz Pro | | carbbon0 -
SEOMoz Crawling Only 1 Page
I entered a new site into my dashboard 2 days ago - everything looked kosher, there were a few hundred pages crawled and a whole bunch of errors. I came back this morning to start work on the site and SEOMoz has crawled the site again, this time returning only 1 page and 0 errors. I haven't even logged in to the site since the first crawl, so I couldn't have broken anything. Has anyone seen this before?
Moz Pro | | Junction0 -
Do we have videos tutorials showing how to use the tools from SEOmoz?
I recently sign up with you guys and watch several videos but can't find tutorials on how to use the incredible tools here... please advice! Many thanks in advance. Bira
Moz Pro | | cssyes0 -
What Q&A Forum Platform Does SEOmoz Use?
Completely custom built, or a 3rd party offering that can be customized? One of the most usable forums I've seen!
Moz Pro | | SEOPA0 -
SeoMoz are creawling all the time. it´s fine?
HI about 1 week, the SeoMoz told me that was crawiling all the time . i don´t know if it a problem whit the system or is new software implemented. i´ts says about 20 days crawiling thanks
Moz Pro | | monotero0