Internal Duplicate Content Question...
-
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
-
If you want to a free test to crawl use this
https://www.deepcrawl.com/forms/free-crawl-report/
Please remember that URIs & URLs are different so your site with 300,000 URLs might have 600,000 URIs if you want to see how it works for free you can sign up for a free crawl for your first 10,000 pages.
I am not affiliated with the company aside from being a very happy customer.
-
Far no way the Best is going to be deep Crawl it automatically connects to Google Webmaster tools and analytics.
it can crawl constantly for ever. The real advantage is setting it to five URLs per second and depending on the speed of your server it will do it consistently I would not go over five pages per second. Make sure that you pick a dynamic IP structuring if you do not have a strong web application firewall if you do pick a single static IP then you can crawl the entire tire site without issue by white listing it. Now this is my personal opinion and I know what you're asking to be accomplished in the literally no time compared to other systems using deep crawl deepcrawl.com
It will show you what duplicate content is contained inside your website duplicate URLs what duplicate title tags you name it.
https://www.deepcrawl.com/knowledge/best-practice/seven-duplicate-content-issues/
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
You have a decent sized website and I would recommend adding a free edition of Robotto.org Robotto, can detect whether a preferredwww or non-www option has been configured correctly.
A lot of issues with web application firewall and CDNs you name it can be detected using the school and the combination of them is a real one-two punch. I honestly think that you will be happy with this tool. I have had issues with anything local like screaming frog when crawling surcharge websites you do not want to depend on your desktop ram. I hope you will let me know if this is a good solution for you I know that it works very very well and it will not stop crawling until it finds everything. Your site will be finished before 24 hours are done.
-
Correct, Thomas. We are not looking to restructure the site at this time but we are looking for a program that will crawl 300,000 plus pages and let us know which internal pages are duplicated.
-
If the tool has to crawl more than a crawl depth of 100 it is very common to find something that's able to do it. Like a said deep crawl, screaming frog & Moz is but you're talking about finding content that shouldn't be restructured.
-
If you looking for the most powerful tool for crawling websites deepcrawl.com is the king. Screaming frog it Is good but is dependent on RAM on your desktop. And does not have as many features as deep crawl
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
-
Check out Siteliner. I've never tried it with a site that big, personally. But it's free, so worth a shot to see what you can get out of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Avoiding duplicate content in manufacturer's [of single product] website
Hello, So I have read a lot of articles about duplicate content/ keyword canibalism/ competing with yourself, and so on. But none of these articles really fit to manufacturer website who produces one product. For example, lets say I make ceramic tiles, this means: Homepage: "Our tiles are the best tiles, we have numerous designs of tiles. We make them only from natural ceramic" Product list: "Here is a list of our tiles: Poesia tile, white tile, textured tile, etc" Page for each tile: Gallery: a bunch of images trying to prove that these tiles look best 🙂 Where to buy page: a map From what I understand this page is already doomed - it will not go well against larger retailers who don't focus only on tiles but they sell everything. This page is set to have a lot of duplicate content. But I hope I am wrong, can someone please make some suggestions how to do SEO on such a website where all pages are about the same thing? Any help would be much appreciated! Juris
Intermediate & Advanced SEO | | JurisBBB0 -
Shall we add engaging and useful FAQ content in all our pages or rather not because of duplication and reduction of unique content?
We are considering to add at the end of alll our 1500 product pages answers to the 9 most frequently asked questions. These questions and answers will be 90% identical for all our products and personalizing them more is not an option and not so necessary since most questions are related to the process of reserving the product. We are convinced this will increase engagement of users with the page, time on page and it will be genuinely useful for the visitor as most visitors will not visit the seperate FAQ page. Also it will add more related keywords/topics to the page.
Intermediate & Advanced SEO | | lcourse
On the downside it will reduce the percentage of unique content per page and adds duplication. Any thoughts about wether in terms of google rankings we should go ahead and benefits in form of engagement may outweight downside of duplication of content?0 -
How to fix Duplicate Content Warnings on Pagination? Indexed Pagination?
Hi all! So we have a Wordpress blog that properly has pagination tags of rel="prev" and rel="next" set up for pages, but we're still getting crawl errors with MOZ for duplicate content on all of our pagination pages. Also, we are having all of our pages indexed as well. I'm talking pages as deep as page 89 for the home page. Is this something I should ignore? Is it hurting my SEO potentially? If so, how can I start tackling it for a fix? Would "noindex" or "nofollow" be a good idea? Any help would be greatly appreciated!
Intermediate & Advanced SEO | | jampaper0 -
Search console, duplicate content and Moz
Hi, Working on a site that has duplicate content in the following manner: http://domain.com/content
Intermediate & Advanced SEO | | paulneuteboom
http://www.domain.com/content Question: would telling search console to treat one of them as the primary site also stop Moz from seeing this as duplicate content? Thanks in advance, Best, Paul. http0 -
Worldwide Stores - Duplicate Content Question
Hello, We recently added new store views for our primary domain for different countries. Our primary url: www.store.com Different Countries URLS: www.store.com/au www.store.com/uk www.store.com/nz www.store.com/es And so forth and so on. This resulted in an almost immediate rankings drop for several keywords which we feel is a result of duplicate content creation. We've thousands of pages on our primary site. We've assigned a "no follow" tags to all store views for now, and trying to roll back the changes we did. However, we've seen some stores launching in different countries with same content, but with a country specific extensions like .co.uk, .co.nz., .com.au. At this point, it appears we have three choices: 1. Remove/Change duplicate content in country specific urls/store views. 2. Launch using .co.uk, .com.au with duplicate content for now. 3. Launch using .co.uk, .com.au etc with fresh content for all stores. Please keep in mind, option 1, and 3 can get very expensive keeping hundreds of products in untested territories. Ideally, we would like test first and then scale. However, we'd like to avoid any duplicate penalties on our main domain. Thanks for your help and answers on the same.
Intermediate & Advanced SEO | | globaleyeglasses0 -
Duplicate content for images
On SEOmoz I am getting duplicate errors for my onsite report. Unfortunately it does not specify what that content is... We are getting these errors for our photo gallery and i am assuming that the reason is some of the photos are listed in multiple categories. Can this be the problem? what else can it be? how can we resolve these issues?
Intermediate & Advanced SEO | | SEODinosaur0 -
Google consolidating link juice on duplicate content pages
I've observed some strange findings on a website I am diagnosing and it has led me to a possible theory that seems to fly in the face of a lot of thinking: My theory is:
Intermediate & Advanced SEO | | James77
When google see's several duplicate content pages on a website, and decides to just show one version of the page, it at the same time agrigates the link juice pointing to all the duplicate pages, and ranks the 1 duplicate content page it decides to show as if all the link juice pointing to the duplicate versions were pointing to the 1 version. EG
Link X -> Duplicate Page A
Link Y -> Duplicate Page B Google decides Duplicate Page A is the one that is most important and applies the following formula to decide its rank. Link X + Link Y (Minus some dampening factor) -> Page A I came up with the idea after I seem to have reverse engineered this - IE the website I was trying to sort out for a client had this duplicate content, issue, so we decided to put unique content on Page A and Page B (not just one page like this but many). Bizarrely after about a week, all the Page A's dropped in rankings - indicating a possibility that the old link consolidation, may have been re-correctly associated with the two pages, so now Page A would only be getting Link Value X. Has anyone got any test/analysis to support or refute this??0 -
Does a mobile site count as duplicate content?
Are there any specific guidelines that should be followed for setting up a mobile site to ensure it isn't counted as duplicate content?
Intermediate & Advanced SEO | | nicole.healthline0