Blocking all robots except rogerbot
-
I'm in the process of working with a site under development and wish to run the SEOmoz crawl test before we launch it publicly. Unfortunately rogerbot is reluctant to crawl the site. I've set my robots.txt to disallow all bots besides rogerbot.
Currently looks like this:
User-agent: * Disallow: / User-agent: rogerbot Disallow:
All pages within the site are meta tagged index,follow.
Crawl report says:
Search Engine blocked by robots.txt Yes
Am I missing something here?
-
...actually I take that back. Still reporting as blocked by robots.txt.
Going to email the team.
-
Thanks, it appears to be crawling without issue now
-
And if that still doesn't work, email help@seomoz.org and they'll help you figure out the right way to let Roger in while excluding everyone else.
-
You've made it upside down
Roger sees the first * and then goes "okay :(" and goes away.
Simply change it to:
User-agent: rogerbot
Disallow:User-agent: *
Disallow: /
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt file issues on Shopify server
We have repeated issues with one of our ecommerce sites not being crawled. We receive the following message: Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. Read our troubleshooting guide. Are you aware of an issue with robots.txt on the Shopify servers? It is happening at least twice a month so it is quite an issue.
Moz Pro | | A_Q0 -
Our crawler was not able to access the robots.txt file on your site.
Good morning, Yesterday, Moz gave me an error that is wasn't able to find our robots.txt file. However, this is a new occurrence, we've used Moz and its crawling ability many times prior; not sure why the error is happening now. I validated that the redirects and our robots page are operational and nothing is disallowing Roger in our robots.txt. Any advice or guidance would be much appreciated. https://www.agrisupply.com/robots.txt Thank you for your time. -Danny
Moz Pro | | Danny_Gallagher0 -
Same linking c-blocks trend as competitor
I noticed in our competitive link report that our number of linking c-blocks has risen and fallen in the exact same pattern as one of our competitors. Is there a reason why this would be happening?
Moz Pro | | ZoomInformation0 -
Allow only Rogerbot, not googlebot nor undesired access
I'm in the middle of site development and wanted to start crawling my site with Rogerbot, but avoid googlebot or similar to crawl it. Actually mi site is protected with login (basic Joomla offline site, user and password required) so I thought that a good solution would be to remove that limitation and use .htaccess to protect with password for all users, except Rogerbot. Reading here and there, it seems that practice is not very recommended as it could lead to security holes - any other user could see allowed agents and emulate them. Ok, maybe it's necessary to be a hacker/cracker to get that info - or experienced developer - but was not able to get a clear information how to proceed in a secure way. The other solution was to continue using Joomla's access limitation for all, again, except Rogerbot. Still not sure how possible would that be. Mostly, my question is, how do you work on your site before wanting to be indexed from Google or similar, independently if you use or not some CMS? Is there some other way to perform it?
Moz Pro | | MilosMilcom
I would love to have my site ready and crawled before launching it and avoid fixing issues afterwards... Thanks in advance.0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
Does SeoMoz realize about duplicated url blocked in robot.txt?
Hi there: Just a newby question... I found some duplicated url in the "SEOmoz Crawl diagnostic reports" that should not be there. They are intended to be blocked by the web robot.txt file. Here is an example url (joomla + virtuemart structure): http://www.domain.com/component/users/?view=registration and the here is the blocking content in the robots.txt file User-agent: * _ Disallow: /components/_ Question is: Will this kind of duplicated url errors be removed from the error list automatically in the future? Should I remember what errors should not really be in the error list? What is the best way to handle this kind of errors? Thanks and best regards Franky
Moz Pro | | Viada0 -
Rogerbot Ignoring Robots.txt?
Hi guys, We're trying to block Rogerbot from spending 8000-9000 of our 10000 pages per week for our site crawl on our zillions of PhotoGallery.asp pages. Unfortunately our e-commerce CMS isn't tremendously flexible so the only way we believe we can block rogerbot is in our robots.txt file. Rogerbot keeps crawling all these PhotoGallery.asp pages so it's making our crawl diagnostics really useless. I've contacted the SEOMoz support staff and they claim the problem is on our side. This is the robots.txt we are using: User-agent: rogerbot Disallow:/PhotoGallery.asp Disallow:/pindex.asp Disallow:/help.asp Disallow:/kb.asp Disallow:/ReviewNew.asp User-agent: * Disallow:/cgi-bin/ Disallow:/myaccount.asp Disallow:/WishList.asp Disallow:/CFreeDiamondSearch.asp Disallow:/DiamondDetails.asp Disallow:/ShoppingCart.asp Disallow:/one-page-checkout.asp Sitemap: http://store.jrdunn.com/sitemap.xml For some reason the Wysiwyg edit is entering extra spaces but those are all single spaced. Any suggestions? The only other thing I thought of to try is to something like "Disallow:/PhotoGallery.asp*" with a wildcard.
Moz Pro | | kellydallen0 -
How to get rid of the message "Search Engine blocked by robots.txt"
During the Crawl Diagnostics of my website,I got a message Search Engine blocked by robots.txt under Most common errors & warnings.Please let me know the procedure by which the SEOmoz PRO Crawler can completely crawl my website?Awaiting your reply at the earliest. Regards, Prashakth Kamath
Moz Pro | | 1prashakth0