Why does my crawl diagnostics show duplicate content
-
My crawl diagnostics show duplicate content at mysite.com and mysite.com/index.html which are essentially the same file.
-
Michel is right - Google doesn't care that they're one template - if both URLs are being crawled, then they'll see that as two "pages". Every unique, crawlable URL can become an indexed page. That's why duplicate content problems are so common.
The good news is that you can put a canonical tag on just the one template/file and it will cover all of the paths/URLs that land on that file. The tag goes in your section and looks like:
I'd check the internal links, though, and see if you're linking to both versions. It's best to use one, consistent URL in your internal links for any given page.
-
mysite.com is a domain not a file with mysite.com/index.html being the home page. Not sure how I would do what you suggest.
-
If the crawl report found those two URLs, then your website has at least one link to each of those URLs (otherwise Rogerbot wouldn't have found them).
You should follow Collin's advice to define the canonical page.
It also won't hurt to figure out where those links are being used in your content, and then make sure you only use one to point to your page.
Cheers
Michel
-
"Essentially" the same file isn't the same as "the same file." Your best bet is probably to mark one of them (probably mysite.com) with rel=canonical.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Member Only Content
I run a wordpress based website that contains a large amount of free content, but also a large amount of content that is only accessed via a paid membership. After running a SEOmoz campaign for the site, it showed 3600 errors for duplicate page titles and 1900 errors for duplicate page content. After looking into the errors it became clear that the majority of them were due to the fact that if you clicked on a link to paid content, it would take you to the paid membership sign in page. So how to I go about fixing these errors? I don't want this to hurt my rankings. Or fix it if it already has.
Moz Pro | | CobraJones950 -
In my crawl diagnostics, there are links to duplicate content. How can I track down where these links originated in?
How can I find out how SEOMOz found these links to begin with? That would help fix the issue. Where's the source page where the link was first encountered listed at?
Moz Pro | | kirklandsl0 -
Rogerbot not showing in logs
Hi All Rogerbot has recently thrown up 403 errors for all our pages - no changes had been made to the site so I asked our ISP for assistance. They wanted to have a look at what rogerbot was doing and so went to the logs but rogerbot was not listed anywhere in the logs by name - any ideas why? Regards Craig
Moz Pro | | CraigWiltshire0 -
Campaign crawl re - schedule
Hello, On the last crawl of a website of mine, seomoz pointed out about 1500 errors (ouch!) on my site. I have made some corrections and i just want to see if they are at the right way but the next crawl is in a week. Is there any way so i can force a crawl before the scheduled date? Thanks!
Moz Pro | | Tz_Seo0 -
Crawl reports urls with duplicate content but its not the case
Hi guys!
Moz Pro | | MakMour
Some hours ago I received my crawl report. I noticed several records with urls with duplicate content so I went to open those urls one by one.
Not one of those urls were really with duplicate content but I have a concern because website is about product showcase and many articles are just images with href behind them. Many of those articles are using the same images so maybe thats why the seomoz crawler duplicate content flag is raised. I wonder if Google has problem with that too. See for yourself how it looks like: http://by.vg/NJ97y
http://by.vg/BQypE Those two url's are flagged as duplicates...please mind the language(Greek) and try to focus on the urls and content. ps: my example is simplified just for the purpose of my question. <colgroup><col width="3436"></colgroup>
| URLs with Duplicate Page Content (up to 5) |0 -
How to get seomoz to re-crawl a site?
I had a lot of duplicate content issues and have fixed all the other warnings. I want to check the site again.
Moz Pro | | adamzski0 -
Crawl completed but no report available for download?
I put 2 crawls in the same day. One came back and delivered a report that I could download. The other one is completed (says so on the page) but there's no way for me to download the report. How do I get a hold of it? Thanks!
Moz Pro | | LiliArancibia0 -
SEOmoz Bot indexing JSON as content
Hello, We have a bunch of pages that contain local JSON we use to display a slideshow. This JSON has a bunch of<a links="" in="" it. <="" p=""></a> <a links="" in="" it. <="" p="">For some reason, these</a><a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p=""></a> <a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p="">One example page this is happening on is: http://www.trendhunter.com/trends/a2591-simplifies-product-logos . Searching for the string '<a' yields="" 1100+="" results="" (all="" of="" which="" are="" recognized="" as="" links="" for="" that="" page="" in="" seomoz),="" however,="" ~980="" these="" json="" code="" and="" not="" actual="" on="" the="" page.="" this="" leads="" to="" a="" lot="" invalid="" our="" site,="" super="" inflated="" count="" on-page="" page. <="" span=""></a'></a> <a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p="">Is this a bug in the SEOMoz bot? and if not, does google work the same way?</a>
Moz Pro | | trendhunter-1598370