Issue in number of pages crawled

cchhita

i wanted to figure out how our friend Roger Bot works.

On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.

Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.

I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.

Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.

Thanks!

Marcus_Miller

Hey Chirag

That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.

If a single page resolves on multiple versions of the URL like...

/pagename

/pagename/

/pagename.html

Then one single page could get reported as three pieces of content.

So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.

Hope that helps.
Marcus

cchhita

Hi Marcus,

Thanks for the reply.

Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.

the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.

Cheers

Marcus_Miller

Hey Chirag

As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Issue in number of pages crawled

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Duplicate pages coming from links from the login page - what should we do about them?

How to remove 404 pages wordpress

404 : Errors in crawl report - all pages are listed with index.html on a WordPress site

Inbound Links To Deleted Pages

Why can't I add my facebook page to SEOMOZ? Also having other facebook issues.

Why do pages with canonical urls show in my report as a "Duplicate Page Title"?

Seomoz & Duplicate Page Content Issue?

Excluding parameters from seomoz crawl?