Internal Duplicate Content Question...
-
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
-
If you want to a free test to crawl use this
https://www.deepcrawl.com/forms/free-crawl-report/
Please remember that URIs & URLs are different so your site with 300,000 URLs might have 600,000 URIs if you want to see how it works for free you can sign up for a free crawl for your first 10,000 pages.
I am not affiliated with the company aside from being a very happy customer.
-
Far no way the Best is going to be deep Crawl it automatically connects to Google Webmaster tools and analytics.
it can crawl constantly for ever. The real advantage is setting it to five URLs per second and depending on the speed of your server it will do it consistently I would not go over five pages per second. Make sure that you pick a dynamic IP structuring if you do not have a strong web application firewall if you do pick a single static IP then you can crawl the entire tire site without issue by white listing it. Now this is my personal opinion and I know what you're asking to be accomplished in the literally no time compared to other systems using deep crawl deepcrawl.com
It will show you what duplicate content is contained inside your website duplicate URLs what duplicate title tags you name it.
https://www.deepcrawl.com/knowledge/best-practice/seven-duplicate-content-issues/
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
You have a decent sized website and I would recommend adding a free edition of Robotto.org Robotto, can detect whether a preferredwww or non-www option has been configured correctly.
A lot of issues with web application firewall and CDNs you name it can be detected using the school and the combination of them is a real one-two punch. I honestly think that you will be happy with this tool. I have had issues with anything local like screaming frog when crawling surcharge websites you do not want to depend on your desktop ram. I hope you will let me know if this is a good solution for you I know that it works very very well and it will not stop crawling until it finds everything. Your site will be finished before 24 hours are done.
-
Correct, Thomas. We are not looking to restructure the site at this time but we are looking for a program that will crawl 300,000 plus pages and let us know which internal pages are duplicated.
-
If the tool has to crawl more than a crawl depth of 100 it is very common to find something that's able to do it. Like a said deep crawl, screaming frog & Moz is but you're talking about finding content that shouldn't be restructured.
-
If you looking for the most powerful tool for crawling websites deepcrawl.com is the king. Screaming frog it Is good but is dependent on RAM on your desktop. And does not have as many features as deep crawl
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
-
Check out Siteliner. I've never tried it with a site that big, personally. But it's free, so worth a shot to see what you can get out of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Tools to scan entire site for duplicate content?
HI guys, Just wondering if anyone knows of any tools to scan a site for duplicate content (with other sites on the web). Looking to quickly identify product pages containing duplicate content/duplicate product descriptions for E-commerce based websites. I know copy scape can which can check up to 10,000 pages in a single operation with Batch Search. But just wondering if there is anything else on the market i should consider looking at? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
What is considered duplicate content?
Hi, We are working on a product page for bespoke camper vans: http://www.broadlane.co.uk/campervans/vw-campers/bespoke-campers . At the moment there is only one page but we are planning add similar pages for other brands of camper vans. Each page will receive its specifically targeted content however the 'Model choice' cart at the bottom (giving you the choice to select the internal structure of the van) will remain the same across all pages. Will this be considered as duplicate content? And if this is a case, what would be the ideal solution to limit penalty risk: A rel canonical tag seems wrong for this, as there is no original item as such. Would an iFrame around the 'model choice' enable us to isolate the content from being indexed at the same time than the page? Thanks, Celine
Intermediate & Advanced SEO | | A_Q0 -
How do I use public content without being penalized for duplication?
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized? Thanks, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup1 -
Duplicate content issue - online retail site.
Hello Mozzers, just looked at a website and just about every product page (there are hundreds - yikes!) is duplicated like this at end of each url (see below). Surely this is a serious case of duplicate content? Any idea why a web developer would do this? Thanks in advance! Luke prod=company-081
Intermediate & Advanced SEO | | McTaggart
prod=company-081&cat=20 -
Duplicate content for hotel websites - the usual nightmare? is there any solution other than producing unique content?
Hiya Mozzers I often work for hotels. A common scenario is the hotel / resort has worked with their Property Management System to distribute their booking availability around the web... to third party booking sites - with the inventory goes duplicate page descriptions sent to these "partner" websites. I was just checking duplication on a room description - 20 loads of duplicate descriptions for that page alone - there are 200 rooms - so I'm probably looking at 4,000 loads of duplicate content that need rewriting to prevent duplicate content penalties, which will cost a huge amount of money. Is there any other solution? Perhaps ask booking sites to block relevant pages from search engines?
Intermediate & Advanced SEO | | McTaggart0 -
Copying my Facebook content to website considered duplicate content?
I write career advice on Facebook on a daily basis. On my homepage users can see the most recent 4-5 feeds (using FB social media plugin). I am thinking to create a page on my website where visitors can see all my previous FB feeds. Would this be considered duplicate content if I copy paste the info, but if I use a Facebook social media plugin then it is not considered duplicate content? I am working on increasing content on my website and feel incorporating FB feeds would make sense. thank you
Intermediate & Advanced SEO | | knielsen0 -
Get Duplicate Page content for same page with different extension ?
I have added a campaign like "Bannerbuzz" in SEOMOZ Pro account and before 2 or 3 days i got errors related to duplicate page content . they are showing me same page with different extension. As i mentioned below http://www.bannerbuzz.com/outdoor-vinyl-banners.html
Intermediate & Advanced SEO | | CommercePundit
&
http://www.bannerbuzz.com/outdoor_vinyl_banner.php We checked our whole source files but we didn't define php related urls in our source code. we want to catch only our .html related urls. so, Can you please guide us to solve this issue ? Thanks <colgroup><col width="857"></colgroup>
| http://www.bannerbuzz.com/outdoor-vinyl-banners.html |0 -
SEOMoz Internal Dupe. Content & Possible Coding Issues
SEOmoz Community! I have a relatively complicated SEO issue that has me pretty stumped... First and foremost, I'd appreciate any suggestions that you all may have. I'll be the first to admit that I am not an SEO expert (though I am trying to be). Most of my expertise is with PPC. But that's beside the point. Now, the issues I am having: I have two sites: http://www.federalautoloan.com/Default.aspx and http://www.federalmortgageservices.com/Default.aspx A lot of our SEO efforts thus-far have done good for Federal Auto Loan... and we are seeing positive impacts from them. However, we recently did a server transfer (may or may not be related)... and since that time a significant number of INTERNAL duplicate content pages have appeared through the SEOmoz crawler. The number is around 20+ for both Federal Auto Loan and Federal Mortgage Services (see attachments). I've tried to include as much as I can via the attachments. What you will see is all of the content pages (articles) with dupe. content issues along with a screen capture of the articles being listed as duplicate for the pages: Car Financing How It Works A Home Loan is Possible with Bad Credit (Please let me know if you could use more examples) At first I assumed it was simply an issue with SEOmoz... however, I am now worried it is impacting my sites (I wasn't originally because Federal Auto Loan has great quality scores and is climbing in organic presence daily). That being said, we recently launched Federal Mortgage Services for PPC... and my quality scores are relatively poor. In fact, we are not even ranking (scratch that, not even showing that we have content) for "mortgage refinance" even though we have content (unique, good, and original content) specifically around "mortgage refinance" keywords. All things considered, Federal Mortgage Services should be tighter in the SEO department than Federal Auto Loan... but it is clearly not! I could really use some significant help here... Both of our sites have a number of access points: http://www.federalautoloan.com/Default.aspx and http://www.federalmortgageservices.com/Default.aspx are both the designated home pages. And I have rel=canonical tags stating such. However, my sites can also be reached via the following: http://www.federalautoloan.com http://www.federalautoloan.com/default.aspx http://www.federalmortgageservices.com http://www.federalmortgageservics.com/default.aspx Should I incorporate code that "redirects" traffic as well? Or is it fine with just the relevancy tags? I apologize for such a long post, but I wanted to include as much as possible up-front. If you have any further questions... I'll be happy to include more details. Thank you all in advance for the help! I greatly appreciate it! F7dWJ.png dN9Xk.png dN9Xk.png G62JC.png ABL7x.png 7yG92.png
Intermediate & Advanced SEO | | WPColt0