Internal Duplicate Content Question...
-
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
-
If you want to a free test to crawl use this
https://www.deepcrawl.com/forms/free-crawl-report/
Please remember that URIs & URLs are different so your site with 300,000 URLs might have 600,000 URIs if you want to see how it works for free you can sign up for a free crawl for your first 10,000 pages.
I am not affiliated with the company aside from being a very happy customer.
-
Far no way the Best is going to be deep Crawl it automatically connects to Google Webmaster tools and analytics.
it can crawl constantly for ever. The real advantage is setting it to five URLs per second and depending on the speed of your server it will do it consistently I would not go over five pages per second. Make sure that you pick a dynamic IP structuring if you do not have a strong web application firewall if you do pick a single static IP then you can crawl the entire tire site without issue by white listing it. Now this is my personal opinion and I know what you're asking to be accomplished in the literally no time compared to other systems using deep crawl deepcrawl.com
It will show you what duplicate content is contained inside your website duplicate URLs what duplicate title tags you name it.
https://www.deepcrawl.com/knowledge/best-practice/seven-duplicate-content-issues/
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
You have a decent sized website and I would recommend adding a free edition of Robotto.org Robotto, can detect whether a preferredwww or non-www option has been configured correctly.
A lot of issues with web application firewall and CDNs you name it can be detected using the school and the combination of them is a real one-two punch. I honestly think that you will be happy with this tool. I have had issues with anything local like screaming frog when crawling surcharge websites you do not want to depend on your desktop ram. I hope you will let me know if this is a good solution for you I know that it works very very well and it will not stop crawling until it finds everything. Your site will be finished before 24 hours are done.
-
Correct, Thomas. We are not looking to restructure the site at this time but we are looking for a program that will crawl 300,000 plus pages and let us know which internal pages are duplicated.
-
If the tool has to crawl more than a crawl depth of 100 it is very common to find something that's able to do it. Like a said deep crawl, screaming frog & Moz is but you're talking about finding content that shouldn't be restructured.
-
If you looking for the most powerful tool for crawling websites deepcrawl.com is the king. Screaming frog it Is good but is dependent on RAM on your desktop. And does not have as many features as deep crawl
https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-highlights-08102015/
-
Check out Siteliner. I've never tried it with a site that big, personally. But it's free, so worth a shot to see what you can get out of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content in Shopify - subsequent pages in collections
Hello everyone! I hope an expert in this community can help me verify the canonical codes I'll add to our store is correct. Currently, in our Shopify store, the subsequent pages in the collections are not indexed by Google, however the canonical URL on these pages aren't pointing to the main collection page (page 1), e.g. The canonical URL of page 2, page 3 etc are used as canonical URLs instead of the first page of the collections. I have the canonical codes attached below, it would be much appreciated if an expert can urgently verify these codes are good to use and will solve the above issues? Thanks so much for your kind help in advance!! -----------------CODES BELOW--------------- <title><br /> {{ page_title }}{% if current_tags %} – tagged "{{ current_tags | join: ', ' }}"{% endif %}{% if current_page != 1 %} – Page {{ current_page }}{% endif %}{% unless page_title contains shop.name %} – {{ shop.name }}{% endunless %}<br /></title>
Intermediate & Advanced SEO | | ycnetpro101
{% if page_description %} {% endif %} {% if current_page != 1 %} {% else %} {% endif %}
{% if template == 'collection' %}{% if collection %}
{% if current_page == 1 %} {% endif %}
{% if template == 'product' %}{% if product %} {% endif %}
{% if template == 'collection' %}{% if collection %} {% endif %}0 -
Shall we add engaging and useful FAQ content in all our pages or rather not because of duplication and reduction of unique content?
We are considering to add at the end of alll our 1500 product pages answers to the 9 most frequently asked questions. These questions and answers will be 90% identical for all our products and personalizing them more is not an option and not so necessary since most questions are related to the process of reserving the product. We are convinced this will increase engagement of users with the page, time on page and it will be genuinely useful for the visitor as most visitors will not visit the seperate FAQ page. Also it will add more related keywords/topics to the page.
Intermediate & Advanced SEO | | lcourse
On the downside it will reduce the percentage of unique content per page and adds duplication. Any thoughts about wether in terms of google rankings we should go ahead and benefits in form of engagement may outweight downside of duplication of content?0 -
Question about moving content from one site to another without a 301
I could use a second opinion about moving content from some inactive sites to my main site. Once upon a time, we had a handful of geotargeted websites set up targeting various cities that we serve. This was in addition to our main site, which was mostly targeted to our primary office and ranked great for those keywords. Our main site has plenty of authority, has been around for ages, etc. We built out these geo-targeted sites with some good landing pages and kept them active with regularly scheduled blog posts which were unique and either interesting or helpful. Although we had a little success with these, we eventually saw the light and realized that our main site was strong enough to rank for these cities as well, which made life a whole lot easier, not to mention a lot less spammy. We've got some good content on these other sites that I'd like to use on our main site, especially the blog posts. Now that I've got it through my head that there's no such thing as a duplicate content penalty, I understand that I could just start moving this content over so long as I put a 301 redirect in place where the content used to be on these old sites. Which leads me to my question. Our SEO was careful not to have these other websites pointing to our main site to avoid looking like we were trying to do something shady from a link building perspective. His concern is that these redirects would undermine that effort and having a bunch of redirects from a half dozen sites could end up hurting us somehow. Do you think that is the case? What he is suggesting we do is remove all of the content that we'd like to use and use Webmaster Tools to request that this content be removed from the index. Then, after the sites have been recrawled, we'll check for ourselves to confirm they've been removed and proceed with using the content however we'd like. Thoughts?
Intermediate & Advanced SEO | | LeeAbrahamson0 -
Duplicate Internal Content on E-Commerce Website
Hi, I find my e-commerce pharmacy website is full of little snippets of duplicate content. In particular: -delivery info widget repeated on all the product pages -product category information repeated product pages (e.g. all medicines belonging to a certain category of medicines have identical side effects and I also include a generic snippet of the condition the medicine treats) Do you think it will harm my rankings to do this?
Intermediate & Advanced SEO | | deelo5550 -
International SEO Question
_The company I work for has a website www.example.com that ranks very well in English speaking countries - US, UK, CA. For legal reasons, we now need to create www.example.co.uk to be accessible and rank in google.co.uk. Obviously we want this change to be as smooth as possible with little effect on rankings in the UK. We have two options that we're talking through at the moment - Use the hreflang tag on both the .com and the .co.uk to tell Google which site to rank in each country. My worry with this is that we might lose our rankings in the UK as it will be a brand new site with little to no links pointing to it. 301 redirect to the .co.uk based on UK IP addresses. I'm skeptical about this. As a 301 passes most of the link juice, I'm not sure how Google would treat this type of thing - would the .com lose ranking? So my questions are - would we lose ranking in the UK if we use option 1? Would option 2 work? What would you do? Any help is appreciated._
Intermediate & Advanced SEO | | awestwood0 -
Joomla duplicate content
My website report says http://www.enigmacrea.com/diseno-grafico-portafolio-publicidad and http://www.enigmacrea.com/diseno-grafico-portafolio-publicidad?limitstart=0 Has the same content so I have duplicate pages the only problem is the ?limitstart=0 How can I fix this? Thanks in advance
Intermediate & Advanced SEO | | kuavicrea0 -
Duplicate content issue for franchising business
Hi All We are in the process of adding a franchise model to our exisitng stand alone business and as part of the package given to the franchisee will be a website with conent identical to our existing website apart from some minor details such as contact and address details. This creates a huge duplicate content issue and even if we implement a cannonical approach to this will still be unfair to the franchisee in terms of their markeitng and own SEO efforts. The url for each franchise will be unique but the content will be the same to a large extend. The nature of the service we offer (professional qualificaitons) is such that the "products" can only be described in a certain way and it will be near on in impossible to have a unique set of "product" pages for each franchisee. I hope that some of you have come across a similar problem or that some of you have suggestions or ideas for us to get round this. Kind regards Peter
Intermediate & Advanced SEO | | masterpete0 -
Duplicate content - canonical vs link to original and Flash duplication
Here's the situation for the website in question: The company produces printed publications which go online as a page turning Flash version, and as a separate HTML version. To complicate matters, some of the articles from the publications get added to a separate news section of the website. We want to promote the news section of the site over the publications section. If we were to forget the Flash version completely, would you: a) add a canonical in the publication version pointing to the version in the news section? b) add a link in the footer of the publication version pointing to the version in the news section? c) both of the above? d) something else? What if we add the Flash version into the mix? As Flash still isn't as crawlable as HTML should we noindex them? Is HTML content duplicated in Flash as big an issue as HTML to HTML duplication?
Intermediate & Advanced SEO | | Alex-Harford0