What is Considered Duplicate Content by Crawlers?
-
I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.
For example, the following URL's were grouped together as duplicate content:
|
https://www.firefold.com/contact-us
|
| https://www.firefold.com/sale |
|
|
How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.
Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.
Any help would be greatly appreciated.
Ryan
-
Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.
And yes I agree, there are a lot of BIG BIG BIG issues with this site.
I got a large workload over the next few months haha.
-
I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:
- Search engines see that there's no text - a big no-no.
- You're getting practically no SEO value from the content that would be there, even if there isn't much.
- It's heavier this way - which makes load times slower.
I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.
-
Thanks for that information.
It makes sense looking at the data and pages from that perspective.
-
Hi Ryan!
Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.
The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.
While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.
Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php
I can't provide much insight from a dev perspective but hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content suggestion
Hello, What is the secret sauce of your content suggestion. How do you consider that a topic is covered. Please explain. Thank you, Flvien
Moz Bar | | seoanalytics0 -
What do I do with content suggestions to help you rank higher?
I am looking at the Page Optimization on a post. Under content suggestions to help you rank higher - Are they recommending that I use the recommend anchor text and use one of the top ranking websites to link to those?
Moz Bar | | RoniFaida1 -
Is MOZ any good to analyze an e-commerce site? How come that a cms page can be seen as duplicate content with a category page?
Hi Guys, I've been using Moz for quite a long time now for 2 of my shops. Now I am in the process of launching the second shop and I just don't understand how is it possible that a cms static page (About US) to be seen as a duplicate content with other 96 pages - including product pages and other totally different pages such as delivery information, category pages, returns and so on. Really MOZ?? Is it me or you?? Your help would be much appreciated! Thank you!
Moz Bar | | Sorin_T0 -
Duplicate Page Content
The site crawl is registering duplicate page content for our storefront site, but the pages aren't the same. They're ascending pages within the same category (ex: Featured, Featured pg2, Featured pg3, and so on). What can be done to fix these errors or prevent them in the future?
Moz Bar | | MGuid550 -
Moz Crawler URL paramaters & duplicate content
Hi all, this is my first post on Moz Q&A 🙂 Questions: Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters? How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report? I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?: Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft Also, if noindex is the only solution, will it impact the ranking of the pages involved? Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed. Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
Moz Bar | | Vukan_Simic0 -
Duplicate page content/page titles on tages
Hello everyone. New to the community and loving it already. Question, I am receiving an error of 6 pages with duplicate content and page titles. A majority of these are tag pages. Should I be worried about these? IN the column listed duplicate urls it is listing 0 ( screen shot: http://screencast.com/t/azvuVk0ucWt) Are these tags a problem? Will SEO be hurt because of this? What are TAG pages? Actually pages, categories, should I eliminate these?
Moz Bar | | Jasonalanmagic0 -
Ok, I have a strange duplicate content issue.
Just ran MOZ against my site and about fell of my chair, over 1k pages being flagged as having duplicate content. That's about every page on the site. I downloaded the CSV file and things got even more confusing, The URL's under "URL" and " Duplicate Page Content" area exactly the same. Am I missing something? I would expect the URL under Duplicate Page Content to show a different URL, to the page with the exact same content. Any help would be appreciated, traffic is tanking and I need to get this figured out. Thanks,
Moz Bar | | Trigger130