Robots review
-
Anything in this that would have caused Rogerbot to stop indexing my site? It only saw 34 of 5000+ pages on the last pass. It had no problems seeing the whole site before.
User-agent: Rogerbot
Disallow: /default.aspx?*
//Keep from crawling the CMS urls default.aspx?Tabid=234. Real home page is home.aspxDisallow: /ctl/
// Keep from indexing the admin controlsDisallow: ArticleAdmin
// Keep from indexing article admin pageDisallow: articleadmin
// same in lower caseDisallow: /images/
// Keep from indexing CMS imagesDisallow: captcha
// keep from indexing the captcha image which appears to be a page to crawls.general rules lacking wildcards
User-agent: * Disallow: /default.aspx Disallow: /images/ Disallow: /DesktopModules/DnnForge - NewsArticles/Controls/ImageChallenge.captcha.aspx
-
Well, our crawler is supposed to respect all standard robots.txt rules, so you should be good just adding them all back in as you normally would and seeing what happens. If it doesn't go through properly, I'll ask our engineers to take a look and find out what's happening!
-
Thanks Aaron.
I will add the rules back as I want Roger to have nearly the same experience to Google and Bing.
Is it best to add one at a time? That could take over a month to figure out what's happening. Is there an easier way to test? Perhaps something like the Google Webmaster Tools Crawler Access tool?
-
Hey! Sorry you didn't have a good experience with your help ticket. I talked with Chiaryn and it sounds like there was some confusion over what you wanted removed from your crawl; it had mentioned that you wanted only one particular page blocked. I think she found something different in your robots.txt - the rules you outline above - so she tried to help you with that situation. Roger does honor all robots.txt parameters so the crawl should only be limited in the way you define, though the wildcards do open you up to a lot of blockage.
It looks like you've since removed your restrictions from roger. Chiaryn and I spoke about it and we'll try to help with your specific site over your ticket. Hope this helps explain! If you want to re-add those parameters and then see what pages are wrongly blocked, I'd love to do that with you - just let us know when you've changed the robots.txt.
-
All urls are rewritten to default.aspx?Tabid=123&Key=Var. None of these are publicly visible once the re-writer is active. I added the rule just to make sure the page is never accidentally exposed and indexed
-
Could you clarify the URL structure for the default.aspx and the true home page. It's only because if you add Disallow: /default.aspx?* (with the wild card) then it will disallow all pages within the /default.aspx folder structure. Just use the same rule for rogerbot as you did for the general rule, this being Disallow: /default.aspx Hope this helps, Vahe
-
Actually, I asked help this question (essentially) first then the lady said she wasn't a web developer and I should ask the community. I was a little taken back frankly.
-
Can't. Default.aspx is the root of the CMS and the redirect will take down the entire website. Rule exists for only a small period where Google indexed the page incorrectly.
-
Hi,
If I was you, I would 301 redirect the default.aspx to the real home page. Once you do that simply remove it from the robots.txt file.
Not only would you strengthen the true home page, but prevent from crawling errors to occur.
There would be a concern that people might even still link to default.aspx which might be causing search engines to index the page. This might be the reason to which rogerbot has stopped crawling your site.
If that's an issue just put a canonical tag for that URL, but still remove that reference.
Hope this helps,
Vahe
-
Hi! If you don't get an answer from the community by Monday, send an email to help@seomoz.org and they'll look at it to see what might be the problem (they're not in on the weekends, otherwise I'd have you send them an email right away).
Thanks!
Keri
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our crawler was not able to access the robots.txt file on your site.
Good morning, Yesterday, Moz gave me an error that is wasn't able to find our robots.txt file. However, this is a new occurrence, we've used Moz and its crawling ability many times prior; not sure why the error is happening now. I validated that the redirects and our robots page are operational and nothing is disallowing Roger in our robots.txt. Any advice or guidance would be much appreciated. https://www.agrisupply.com/robots.txt Thank you for your time. -Danny
Moz Pro | | Danny_Gallagher0 -
How to deal with fake review posted on ripoffreport.com by a competitor.
In 2010 we launched a product update (physical product) that was really good for us in sales and we were taking a lot of business from our main competitor. To stop the bleeding they posted a fake ripoffreport (completely made up pretending to be a former employee, we are a family business and employ few people with almost no turn over in the last 10 years). They also posted fake reviews on Google's local product at the time for any store that used to use their product but that had switched to ours. We were able to get google to remove the fake reviews as there was no way one user visited over 100 stores and purchased the same service on the same day in all these locations around the US. But they would do nothing about delisting the report page without a court order. So the fake google reviews went away, but the ripoffreport has become immortal. The reviews were originally posted anonymously and then commented on by another anonymous user. These same two anonymous users then filed ripoffreports against a couple of our mutual customers as well. Since they are anonymous we cannot sue anyone to get them to remove it, since it is passed the statutes of limitations we cannot do a john doe law suit to get a judgment by default. So the report is there to stay. We have worked to get more content up about us, we have great product reviews on facebook and other outlets that have sold and spotlighted our products, and we are partnering with industry specific bloggers and traditional media content sites to get links and reviews (all white hat stuff, great public relations stuff), but we cannot get our interior pages, facebook pages, or the other reviews to rank higher than this report when searching for our brand name and even worse when you search for our "brand name reviews" it is the first result on google. Can anyone help me understand how I can use MOZ to help me identify how to outrank this page with interior pages so that it falls off the front page of google? Sorry if that is a newbie question but I have done a lot of things and it has worked some but not as much as I need it to. And it seems that in the last few weeks the report has become stronger in the rankings again. Any suggestions you could offer would be greatly appreciated.
Moz Pro | | erickcalderon0 -
Website blocked by Robots.txt in OSE
When viewing my client's website in OSE under the Top Pages tab, it shows that ALL pages are blocked by Robots.txt. This is extremely concerning because Google Webmaster Tools is showing me that all pages are indexed and OK. No crawl errors, no messages, no nothing. I did a "site:website.com" in Google and all of the pages of the website returned. Any thoughts? Where is OSE picking up this signal? I cannot find a blocked robots tag in the code or anything.
Moz Pro | | ConnellyPartners0 -
Site Redesign Launch - How Can I crawl for immediate review
Just redesigned my site and want to have a crawl done to check for errors or any items which need to be cleaned up. Anyone know how I can do this as SEOMoz only crawls once per week. Thanks!
Moz Pro | | creativemobseo0 -
Does SeoMoz realize about duplicated url blocked in robot.txt?
Hi there: Just a newby question... I found some duplicated url in the "SEOmoz Crawl diagnostic reports" that should not be there. They are intended to be blocked by the web robot.txt file. Here is an example url (joomla + virtuemart structure): http://www.domain.com/component/users/?view=registration and the here is the blocking content in the robots.txt file User-agent: * _ Disallow: /components/_ Question is: Will this kind of duplicated url errors be removed from the error list automatically in the future? Should I remember what errors should not really be in the error list? What is the best way to handle this kind of errors? Thanks and best regards Franky
Moz Pro | | Viada0 -
Why does SEOMoz crawler ignore robots.txt?
The SEOMoz crawler ignores robots.txt It also "indexes" pages marked as noindex. That means it is filling up the reports with things that don't matter. Is there any way to stop it doing that?
Moz Pro | | loopyal0 -
Blocking all robots except rogerbot
I'm in the process of working with a site under development and wish to run the SEOmoz crawl test before we launch it publicly. Unfortunately rogerbot is reluctant to crawl the site. I've set my robots.txt to disallow all bots besides rogerbot. Currently looks like this: User-agent: * Disallow: / User-agent: rogerbot Disallow: All pages within the site are meta tagged index,follow. Crawl report says: Search Engine blocked by robots.txt Yes Am I missing something here?
Moz Pro | | ignician0 -
What's name of SEOmoz and Open Site Explorer robots?!
I would like to exclude in robots.txt SEOmoz and Open Site Explorer bots to don't let them index my sites… what's their names?
Moz Pro | | cezarylech0