Roger bot taking a long time to crawl site

caterfor

Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?

thanks a lot, Mark.

caterfor

Hi Peter

thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.

I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:

User-agent: *
Disallow: /

I hadn't thought beyond this.

I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..

I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.

I know (well think) I have to get noindex, follow for 'sorted' category pages...

all the best, Mark.

caterfor

Hi Mike

The crawl has now completed, thank you. I think the results will keep me occupied

all the best, Mark.

Peterli

Hi Mark,

Sorry it's taking a while to crawl your new site.

While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:

# Crawlers Setup
User-agent: *
Crawl-delay: 30
# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/

From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:

Allowable Index

Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/

While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.

# Crawlers Setup
User-agent: *
Crawl-delay: 30
Disallow: /
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/

From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl.

Thanks for reaching out!

Best,

Peter Li
SEOmoz Help Team
```

Mike.Goracke

Hi Mark,

This sounds like a bug or issue with the SEOmoz software.

Contact help@seomoz.org and ask one of the help associates to look into this for you.

If you do not have many pages, it definitely shouldn't take that long.

The help team responds extremely quickly!

Good luck.

Mike

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Roger bot taking a long time to crawl site

Allowable Index

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Site redesign makes Moz Site Crawl go haywire

How long does it take for Webmaster Tools to index a site?

Launch of improved site

Site Wide Links

How is this site doing this?

Penalities in a brand new site, Sandbox Time or rather a problem of the site?

Tracking a Crawl error

My company is redesigning their site, and is re-tooloing some of the product and service keywords. How should we approach SEO of the new site?