Tools that crawl 2 million page sites
-
Our site is about 2million pages deep, 50% of which is stale content. Yes, I know - OMG #unhygienic. Even if we get approval to get rid of half of it. SEOMoz Pro Elite only crawls 20k deep - what can i do to crawl and diagnose the whole site. Are there any tools anyone can suggest. SEOMoz??
-
That's good to know. It sounds like that's probably the best way. I also use Screaming Frog (http://www.screamingfrog.co.uk/seo-spider/) to try and crawl sites and with dedicated 2Gigs of ram, it's able to crawl around 50k pages. If your site is structured in sub-folders, you might be able to break it into parts and then crawl. But then if not, the SEOMOZ Enterprise looks like the way to go.
-
There is an enterprise version of SEOmoz which will do 1 million pages a month and up to 30k keywords which is well worth looking into if you have a enormous web property.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages with URL Too Long
Hello Mozzers! MOZ keeps kindly telling me the URLs are too long. However, this is largely due to the structure of E-commerce site, which has to include 'brand' 'range' and 'products' keyword. For example -
Moz Pro | | tigersohelll
https://www.choicefurnituresuperstore.co.uk/Devonshire-Rustic-Oak-Bedside-Cabinet-1-Drawer-p40668.html MOZ recommends no more than 75 characters. This means we have 25-30 characters for both the brand name and product name. Questions:
If it is an issue, how to fix it on my site?
If it's not an issue, how can we turn off this alert from MOZ?
Anyone know how big an issue URLs are as a ranking factor? I thought pretty low.0 -
Duplicate page report
We ran a CSV spreadsheet of our crawl diagnostics related to duplicate URLS' after waiting 5 days with no response to how Rogerbot can be made to filter. My IT lead tells me he thinks the label on the spreadsheet is showing “duplicate URLs”, and that is – literally – what the spreadsheet is showing. It thinks that a database ID number is the only valid part of a URL. To replicate: Just filter the spreadsheet for any number that you see on the page. For example, filtering for 1793 gives us the following result: | URL http://truthbook.com/faq/dsp_viewFAQ.cfm?faqID=1793 http://truthbook.com/index.cfm?linkID=1793 http://truthbook.com/index.cfm?linkID=1793&pf=true http://www.truthbook.com/blogs/dsp_viewBlogEntry.cfm?blogentryID=1793 http://www.truthbook.com/index.cfm?linkID=1793 | There are a couple of problems with the above: 1. It gives the www result, as well as the non-www result. 2. It is seeing the print version as a duplicate (&pf=true) but these are blocked from Google via the noindex header tag. 3. It thinks that different sections of the website with the same ID number the same thing (faq / blogs / pages) In short: this particular report tell us nothing at all. I am trying to get a perspective from someone at SEOMoz to determine if he is reading the result correctly or there is something he is missing? Please help. Jim
Moz Pro | | jimmyzig0 -
How to make Seo Tools Site serves ?
hello i want to add seo tools at my site for users i want my visitors able to use pagerank checker , link tracker , backlinks checker , etc - and keyword tools , domain tools , Analytics and Reporting. something like http://smallseotools.com/ what scripts i need ? can i do this all with wordpress ? Thanks 🙂
Moz Pro | | Wagdys0 -
What SEO Tools Do You Use?
Things changed a bit for me when Raven Stopped using google and SemRush data and I haven't exactly found a solution I'm happy with so Just wanted to use what people use alongside SEOMoz, or what they use instead of certain parts of SEOMoz. Before I was using Raven Tools, Screaming Frog SEO Crawler but currently I'm using Screaming Frog SEO Crawler, SEOMoz (trial), and SerpIQ (trial).
Moz Pro | | GBabyWilson0 -
Can I prevent some pages from being crawled from SEOMoz spider and still not affect Google Spider?
Well, basically that's the question 😄 Can I prevent some pages from being crawled from SEOMoz spider and still not affect Google Spider? This is, I have more than 10.000 pages on the website, and I am not interested in having reports for many of them, but I still wanna get SEO visits on them, so I want Google to crawl it easily... Thanks!
Moz Pro | | MattDG0 -
SEOmoz tool Issue?
Hi Mozzers, I am doing a web maintenance task for a client and it's been weeks that Moz is detecting 49 duplicate pages ( contact page). I thought resolving the issue when creating the xml sitemap and excluding those duplicates. The moz tool would still detect them, so I went in making a search with some of these duplicate to check if they were indexed but non of them were indexed. So my question is has anyone recently experienced similar issues? Is the moz tool not 100% accurate? Thanks for sharing your thoughts and answers
Moz Pro | | Ideas-Money-Art0 -
Can Google see all the pages that an seomoz crawl picks up?
Hi there My client's site is showing around 90 pages indexed in Google. The seomoz crawl is returning 1934 pages. Many of the pages in the crawl are duplicates, but there are also pages which are behind the user login. Is it theoretically correct to say that if a seomoz crawl finds all the pages, then Google has the potential to as well, even if they choose not to index? Or would Google not see the pages behind the login? And how come seomoz can see the pages? Many thanks in anticipation! Wendy
Moz Pro | | Chammy0 -
Can we add sites to the crawl queue for OSE?
Is it possible to request that Open Site Explorer crawls a new URL on its next run? This tool is the first place I go to when working on a new site, and when there is "No Data Available" this is a little frustrating. I fully appreciate that this lack of data is usually a signal that the website is either very new or of low quality, however that if often the reason that I am brought in and would very much like to benchmark and provide initial analysis using this tool. It would make sense that OSE crawls the sites that Moz members are working on wouldnt it? Scott.
Moz Pro | | eseyo0