Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate pages coming from links from the login page - what should we do about them?
This is a follow on to an earlier question which was well answered by Dirk Ceuppens regarding abnormal crawl issues. We are seeing that the issues relating to Duplicate Pages are coming from links from the login page which shows information about where the user was redirected from. For example, if the visitor is not logged on and wishes to wish-list an item, they will be redirected to the login page, with the item code and intended action in the url; which can then continue on to the desired page once logged on. The MOZ crawler is seeing these pages as having Duplicated Content whilst they are all the same apart from a piece of information in the URL. Should we be blocking these duplications? Are they a risk to us? What should we be doing? Many thanks, Sarah
Moz Pro | | Mutatio_Digital0 -
How to remove 404 pages wordpress
I used the crawl tool and it return a 404 error for several pages that I no longer have published in Wordpress. They must still be on the server somewhere? Do you know how to remove them? I think they are not a file on the server like an html file since Wordpress uses databases? I figure that getting rid of the 404 errors will improve SEO is this correct? Thanks, David
Moz Pro | | DJDavid0 -
404 : Errors in crawl report - all pages are listed with index.html on a WordPress site
Hi Mozers, I have recently submitted a website using moz, which has pulled up a second version of every page on the WordPress site as a 404 error with index.html at the end of the URL. e.g Live page URL - http://www.autostemtechnology.com/applications/civil-blasting/ Report page URL - http://www.autostemtechnology.com/applications/civil-blasting/index.html The permalink structure is set as /%postname%/ For some reason the report has listed every page with index.html at the end of the page URL. I have tried a number of redirects in the .htaccess file but doesn't seem to work. Any suggestions will be strongly appreciated. Thanks
Moz Pro | | AmanziDigital0 -
Inbound Links To Deleted Pages
Hi, I recently deleted some pages from my website and believe that there will be external inbound links pointing to these pages. I would like to find them and put redirects in place - can anybody tell me how to use SEOMOZ to find where external links are poiting to moved/deleted pages Thanks
Moz Pro | | stayin1 -
Why can't I add my facebook page to SEOMOZ? Also having other facebook issues.
Hi, I have no trouble adding my twitter page in SEOMOZ, but its giving me an error when I try to load my facebook page http://www.facebook.com/pages/Eugene-Computer-Geeks/226660334011653 . I also tried adding my personal facebook page which is tied to the Eugene Computer Geeks facebook page, but SEOMOZ wont accept that either. My business facebook page is tied to my personal account, and its also not showing up on the facebook search. Any idea how I can make my business show up? I wish I could just start over fresh and have my buinsess setup with it's own facebook account. Thanks.
Moz Pro | | eugenecomputergeeks1 -
Why do pages with canonical urls show in my report as a "Duplicate Page Title"?
eg: Page One
Moz Pro | | DPSSeomonkey
<title>Page one</title>
No canonical url Page Two
<title>Page one</title> Page two is counted as being a page with a duplicate page title.
Shouldn't it be excluded?0 -
Seomoz & Duplicate Page Content Issue?
Hi, What is the criteria on Seomoz Crawl Diagnostic Report? I got a long list of URLs indicating Content that is identical (or nearly identical) to content on other pages of your site forces your pages to unnecessarily compete with each other for rankings. But as I gone through none of the reported pages duplicate. What should I do? Thanks in Advance
Moz Pro | | VipinLouka780 -
Excluding parameters from seomoz crawl?
I'm getting a ton of duplicate content errors because almost all of my pages feature a "print this page" link that adds the parameter "printable=Y" to the URL and displays a plain text version of the same page. Is there any way to exclude these pages from the crawl results?
Moz Pro | | AmericanOutlets0