Robots review
-
Anything in this that would have caused Rogerbot to stop indexing my site? It only saw 34 of 5000+ pages on the last pass. It had no problems seeing the whole site before.
User-agent: Rogerbot
Disallow: /default.aspx?*
//Keep from crawling the CMS urls default.aspx?Tabid=234. Real home page is home.aspxDisallow: /ctl/
// Keep from indexing the admin controlsDisallow: ArticleAdmin
// Keep from indexing article admin pageDisallow: articleadmin
// same in lower caseDisallow: /images/
// Keep from indexing CMS imagesDisallow: captcha
// keep from indexing the captcha image which appears to be a page to crawls.general rules lacking wildcards
User-agent: * Disallow: /default.aspx Disallow: /images/ Disallow: /DesktopModules/DnnForge - NewsArticles/Controls/ImageChallenge.captcha.aspx
-
Well, our crawler is supposed to respect all standard robots.txt rules, so you should be good just adding them all back in as you normally would and seeing what happens. If it doesn't go through properly, I'll ask our engineers to take a look and find out what's happening!
-
Thanks Aaron.
I will add the rules back as I want Roger to have nearly the same experience to Google and Bing.
Is it best to add one at a time? That could take over a month to figure out what's happening. Is there an easier way to test? Perhaps something like the Google Webmaster Tools Crawler Access tool?
-
Hey! Sorry you didn't have a good experience with your help ticket. I talked with Chiaryn and it sounds like there was some confusion over what you wanted removed from your crawl; it had mentioned that you wanted only one particular page blocked. I think she found something different in your robots.txt - the rules you outline above - so she tried to help you with that situation. Roger does honor all robots.txt parameters so the crawl should only be limited in the way you define, though the wildcards do open you up to a lot of blockage.
It looks like you've since removed your restrictions from roger. Chiaryn and I spoke about it and we'll try to help with your specific site over your ticket. Hope this helps explain! If you want to re-add those parameters and then see what pages are wrongly blocked, I'd love to do that with you - just let us know when you've changed the robots.txt.
-
All urls are rewritten to default.aspx?Tabid=123&Key=Var. None of these are publicly visible once the re-writer is active. I added the rule just to make sure the page is never accidentally exposed and indexed
-
Could you clarify the URL structure for the default.aspx and the true home page. It's only because if you add Disallow: /default.aspx?* (with the wild card) then it will disallow all pages within the /default.aspx folder structure. Just use the same rule for rogerbot as you did for the general rule, this being Disallow: /default.aspx Hope this helps, Vahe
-
Actually, I asked help this question (essentially) first then the lady said she wasn't a web developer and I should ask the community. I was a little taken back frankly.
-
Can't. Default.aspx is the root of the CMS and the redirect will take down the entire website. Rule exists for only a small period where Google indexed the page incorrectly.
-
Hi,
If I was you, I would 301 redirect the default.aspx to the real home page. Once you do that simply remove it from the robots.txt file.
Not only would you strengthen the true home page, but prevent from crawling errors to occur.
There would be a concern that people might even still link to default.aspx which might be causing search engines to index the page. This might be the reason to which rogerbot has stopped crawling your site.
If that's an issue just put a canonical tag for that URL, but still remove that reference.
Hope this helps,
Vahe
-
Hi! If you don't get an answer from the community by Monday, send an email to help@seomoz.org and they'll look at it to see what might be the problem (they're not in on the weekends, otherwise I'd have you send them an email right away).
Thanks!
Keri
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocking Moz
Moz are reporting the robots.txt file is blocking them from crawling one of our websites. But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems. We have never come up against this before, even with this site. Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error. Can anyone enlighten us to the problem please? http://www.wychwoodflooring.com -Christina
Moz Pro | | ChristinaRadisic0 -
Meta Robots query
Hi guys, I was ranking really well on my home page for certain keywords which has all dropped pretty dramatically over the last 3/4 weeks - I think the issue is since since the configuration of Yoast SEO Wordpress plugin. In March (when my rankings were strong) my crawl test showed the top data in the attached image, and in May (now the rankings have dropped severly) they show the bottom data. I don't fully understand canonical and Meta Robots so I am hoping someone can shed some light on the following points. 1. Will the change result in my loss of rankings.
Moz Pro | | RocketStats
2. How can I put it back to how it was in March? PS. I haven't had any Google penalties. Thanks,
Joshua RfTar0 -
Blocked by Meta Robots.
Hi, I get this warning on my reporting. Blocked by Meta Robots - This page is being kept out of the search engine indexes by meta-robots. what does that means ? and how to solve that, if i using wordpress as my website engine. and about rel=canonical , in which page I should put this tag, in original page, or in copy page ? thanks for all of your answer, it will be means a lot
Moz Pro | | theconversion0 -
Does SeoMoz realize about duplicated url blocked in robot.txt?
Hi there: Just a newby question... I found some duplicated url in the "SEOmoz Crawl diagnostic reports" that should not be there. They are intended to be blocked by the web robot.txt file. Here is an example url (joomla + virtuemart structure): http://www.domain.com/component/users/?view=registration and the here is the blocking content in the robots.txt file User-agent: * _ Disallow: /components/_ Question is: Will this kind of duplicated url errors be removed from the error list automatically in the future? Should I remember what errors should not really be in the error list? What is the best way to handle this kind of errors? Thanks and best regards Franky
Moz Pro | | Viada0 -
Seomoz bar: No Follow and Robots.txt
Should the Mozbar pickup 'nofollow" links that are handled in robots.txt ? the robots.tx blocks categories, but is still show as a followed (green) link when using the mozbar. Thanks! Holly ETA: I'm assuming that- disallow: myblog.com/category/ - is comparable to the nofollow tag on catagory?
Moz Pro | | squareplug0 -
Why does SEOMoz crawler ignore robots.txt?
The SEOMoz crawler ignores robots.txt It also "indexes" pages marked as noindex. That means it is filling up the reports with things that don't matter. Is there any way to stop it doing that?
Moz Pro | | loopyal0 -
Does the SEOMoz weekly crawl that highlights no meta description tag, take into account if there is a meta robots noindex,follow tag on the pages it indicates the missing meta descriptions?
The weekly crawl website report is telling me that there are pages that have missing meta description tags, yet I've implemented meta robots tags to 'noindex, follow' those pages which are visible in those page source files. As far as Google Is concerned, surely this then won't be a problem since it is being instructed NOT to consider these specific pages for indexing. I am assuming that the weekly SEOmoz website crawl is simply throwing the missing meta description crawl findings into its report without itself observing that the particluar URL references contain the meta robots 'noindex,follow' tag ???? Appreciate if you can clairfy if this is the case. It would help me understand that (at least in terms of my efforts towards Google) your own crawl doesn't observe the meta robots tag instruction, hence the resultant report's flagging the discrepancy.
Moz Pro | | callassist0 -
How to get rid of the message "Search Engine blocked by robots.txt"
During the Crawl Diagnostics of my website,I got a message Search Engine blocked by robots.txt under Most common errors & warnings.Please let me know the procedure by which the SEOmoz PRO Crawler can completely crawl my website?Awaiting your reply at the earliest. Regards, Prashakth Kamath
Moz Pro | | 1prashakth0