Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why are these blackhat sites so successful?
Here's an interesting conundrum. Here are three sites with their respective ranking for "dental implants [city]:" http://dentalimplantsvaughan.ca - 9 (on google.ca) http://dentalimplantsinhonoluluhi.com - 2 (on google.com) http://dentalimplantssurreybc.ca - 7 (on google.ca) These markets are not particularly competitive, however, all of these sites suffer from: Duplicate content, both internally and across sites (all of this company's implant sites have the same exact content, minus the bio pages and the local modifier). Average speed score. No structured data No links And these sites are ranking relatively quickly. The Vaughan site went live 3 months ago. But, what's boggling my mind is that they rank on the first page at all. It seems they're doing the exact opposite of what you're supposed to do, yet they rank relatively well.
Technical SEO | | nowmedia10 -
Local site under generic domain
Howdy Mozers, We have main website on .com domain and local websites for each language like .es, .fr, .in etc. We decided to move all local sites under main domain .com using subdirectories with gTLDs. One of the local sites has a manual penalty. Right now we are redirecting local site which have penalty using 302 redirect. So my question is. Will 302 redirect hurt our main site? Is there any other way to redirect visitors from local site without passing penalty? We have few thousands monthly users who are still using local domain links to get to our site, so we can't remove redirect at all. Best Regards,
Technical SEO | | juris_l
Juris0 -
Why my site is not indexing in google
In google webmaster i have updated my sitemap in Mar 6th..There is around 22000 links..But google fetched only 5300 links for long time...
Technical SEO | | Rajesh.Chandran
I waited for 1 month till no improvement in google index..So apr6th we have uploaded new sitemap (1200 links totally)..,But only 4 links indexed in google ..
why google not indexing my urls? Is this affect our ranking in SERP? How many links are advisable to submit in sitemap for a website?0 -
Site Redesign - Regaining Rankings
We just finished designing a whole new site that will hopefully convert better than our previous site and we are currently coding it. We are hoping to get the site out in the next month or two (or three!). We want to know what to expect in regard to our sales from SEO. If you successfully launched a site redesign and your conversion rate improved, can you answer this question? How long will it take for my rankings to regain their initial ranking and then hopefully rank even higher?
Technical SEO | | EcomLkwd0 -
Examples of sites using hreflang
Hi all, I'll soon be doing some work for a worldwide company who are launching a new site. The new site is a near clone of another of their sites in another country. Obviously I'll need to make use of rel="alternate" hreflang="x" on both sites. I've read all the Google documentation etc but was wondering if you guys could point me in the direction of a few sites which are currently implementing the tag successfully. Thanks in advance,
Technical SEO | | iProspect-Ireland0 -
Feedback for the onpage seo for this site
Hi, Can the seo gurus here, suggest me if any on page factors affect my site? http://www.ridpiles.com/ Recently i have added, the following post to the main home page, http://www.ridpiles.com/2012/02/different-types-of-cures-for-piles/ This page is somewhat different than the title keyword. As the main page titile is "hemorrhoids treatment". The newly created blog post is on "cure for piles" Does this blog post has any affect on the on page factors due to different title? And do i require any changes regarding the on page seo? Will be waiting for your replies.
Technical SEO | | Indexxess0 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0