Duplicate content check picking up weird urls
-
Hi everyone,
I love the duplicate content feature; we have a lot of duplicate content issues due to the way our site is structured. So, we're working on them. However, I'm not fully understanding the results. For example, say I have an article on breast cancer symptoms. It shows up as duplicate content, by having two urls that point to the exact same page. http://www.healthchoices.ca/articles/breast cancer symptoms and http://www.healthchoices.ca/somerandomstringofcode. I fully understand why that is duplicate content.
I am not sure about this though, it picks up the same url twice and calls it duplicate content. For example, saying that http://www.healthchoices.ca/dr.-so-and-so and http://www.healthchoices.ca/dr.-so-and-so is duplicate...however is this not the same page? Is there something I'm missing? Many of the URL's are identical.
Thanks,
Erin
-
Hi Erin -
Is that a Google Webmaster file?
Looking at those URLs in SERPS, it seems you have some content causing duplicates (although the file doesnt seem to represent it that way).
Here's the URLs in Google search results for Term-Life-Insurance:
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance/montreal/quebec (duplicate of previous)
- http://www.healthchoices.ca/video-link/insurance-and-disability-planning/Term-Life-Insurance
- http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance/laval/quebec (duplicate of previous)
Looking at the first two as an example, when you look at th pages themselves they are currently not exact duplicates. The first one is a video of a guy talking about term life insurance with some other video links, and the second page is a page that has an error "Error: Video Category Page is currently unavailable." where the page content should be. But that page had previously been an exact duplicate of the first URL the last time Google visited the page.
Here is the first page again:
http://www.healthchoices.ca/video/insurance-and-disability-planning/term-life-insurance
Here is the cached version of the second (duplicate) page (as I'm currently seeing it, it was last cached on Apr 19, 2011):
To see these pages (or any potential duplicate URL issues), do this search in Google:
- site:www.healthchoices.ca
- To find pages with a specific URL pattern (like the term life insurance pages) try "site:www.healthchoices.ca inurl:Term-Life-Insurance" (without the quotation marks)
- Then at the end of the URL you see in the address bar, add "&filter=0" (without the quoutes).
So what is in your browser address bar would look like this (although it may have some additional thinkgs in your URL like your previous query and your browser and language for example - that's ok):
http://www.google.com/search?q=site:www.healthchoices.ca+inurl:Term-Life-Insurance&filter=0
I'm not sure what the URL issue is that you're referring to exactly based on the info you pasted and where you may have gotten it from - but I hope this is helpful.
-
Hi Erin,
Can I enquire a little more about where you are lifting these URLs from. I'm assuming you are downloading them from a Campaign? Are the URLs in question lifted from the same row in the CSV? What is the header of the columns they are lifted from? Just need a little more specificity about what we're looking at here in order to respond fully.
-
Thanks for your responses. Hmm...I'm not sure how to do a screen shot as the only way I could see the errors was to download the file. I've pasted a few below straight from the doc
<colgroup><col width="775"><col width="968"></colgroup>
| www.healthchoices.ca/video/ice-sports/default | www.healthchoices.ca/video/ice-sports/default |
| www.healthchoices.ca/video/insurance-and-disability-planning/Key-Man-Insurance | www.healthchoices.ca/video/insurance-and-disability-planning/Key-Man-Insurance |
| www.healthchoices.ca/video/insurance-and-disability-planning/Long-Term-Care-Coverage | www.healthchoices.ca/video/insurance-and-disability-planning/Long-Term-Care-Coverage |
| www.healthchoices.ca/video/insurance-and-disability-planning/Term-Life-Insurance | www.healthchoices.ca/video/insurance-and-disability-planning/Term-Life-Insurance |
| www.healthchoices.ca/video/insurance-and-disability-planning/default | www.healthchoices.ca/video/insurance-and-disability-planning/default | -
Erin, what tool are you using to find this? It might be something to do with the language that your CMS is written in - it might also be a matter of a trailing slash or a non www. version.
I'd be happy to help if you could provide a little more info, perhaps a screen shot?
Aaron
-
Duplicate content by definition is having the same content on different URL's. I've never had the tool tell me I have duplicate content on the same URL. You must be missing something. Is it www vs non-www perhaps? I don't know how you can get identical url's showing up in there.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content and Subdirectories
Hi there and thank you in advance for your help! I'm seeking guidance on how to structure a resources directory (white papers, webinars, etc.) while avoiding duplicate content penalties. If you go to /resources on our site, there is filter function. If you filter for webinars, the URL becomes /resources/?type=webinar We didn't want that dynamic URL to be the primary URL for webinars, so we created a new page with the URL /resources/webinar that lists all of our webinars and includes a featured webinar up top. However, the same webinar titles now appear on the /resources page and the /resources/webinar page. Will that cause duplicate content issues? P.S. Not sure if it matters, but we also changed the URLs for the individual resource pages to include the resource type. For example, one of our webinar URLs is /resources/webinar/forecasting-your-revenue Thank you!
Technical SEO | | SAIM_Marketing0 -
URL slash creating duplicate content
Hi All, I currently have an issue whereby by domain name (just homepage) has: mydomain.com and: mydomain.com/ Moz crawler flags this up as duplicate content - does anyone know of a way I can fix this? Thanks! Jack
Technical SEO | | Jack11660 -
Email and landing page duplicate content issue?
Hi Mozers, my question is, if there is a web based email that goes to subscribers, then if they click on a link it lands on a Wordpress page with very similar content, will Google penalize us for duplicate content? If so is the best workaround to make the email no index no follow? Thanks!
Technical SEO | | CalamityJane770 -
Best Way to Handle Near-Duplicate Content?
Hello Dear MOZers, Having duplicate content issues and I'd like some opinions on how best to deal with this problem. Background: I run a website for a cosmetic surgeon in which the most valuable content area is the section of before/after photos of our patients. We have 200+ pages (one patient per page) and each page has a 'description' block of text and a handful of before and after photos. Photos are labeled with very similar labels patient-to-patient ("before surgery", "after surgery", "during surgery" etc). Currently, each page has a unique rel=canonical tag. But MOZ Crawl Diagnostics has found these pages to be duplicate content of each other. For example, using a 'similar page checker' two of these pages were found to be 97% similar. As far as I understand there are a few ways to deal with this, and I'd like to get your opinions on the best course. Add 150+ more words to each description text block Prevent indexing of patient pages with robots.txt Set the rel=canonical for each patient page to the main gallery page Any other options or suggestions? Please keep in mind that this is our most valuable content, so I would be reluctant to make major structural changes, or changes that would result in any decrease in traffic to these pages. Thank you folks, Ethan
Technical SEO | | BernsteinMedicalNYC0 -
Weird, long URLS returning crawl error
Hi everyone, I'm getting a crawl error "URL too long" for some really strange urls that I'm not sure where they are being generated from or how to resolve it. It's all with one page, our request info. Here are some examples: http://studyabroad.bridge.edu/request-info/?program=request info > ?program=request info > ?program=request info > ?program=request info > ?program=programs > ?country=country?type=internships&term=short%25 http://studyabroad.bridge.edu/request-info/?program=request info > ?program=blog > notes from the field tefl student elaina h in chile > ?utm_source=newsletter&utm_medium=article&utm_campaign=notes%2Bfrom%2Bthe%2Bf Has anyone seen anything like this before or have an idea of what may be causing it? Thanks so much!
Technical SEO | | Bridge_Education_Group0 -
Duplicate url problem causing me problems
Hi, i am working with a joomla site and i am using the sh404sef plugin. I have contacted the developer of the plugin who has not been very helpful so i am hoping to get help here. The problem i am having is, the description of the page showing in google listings is not the same as what i have put into the meta tag description. for example, for this page http://www.clairehegarty.co.uk/virtual-gastric-band-with-hypnotherapy the meta tag description should be Gastric Band Hypnotherapy to lose weight guaranteed. Free Gastric Band Hypnosis Consultations with Well Known Gastric Hypno Band expert as seen on TV. Hypno Gastric Band Works. We offer full support after your Gastric Band Hypnotherapy but in google it is showing Gastric Band Hypnotherapy Works. If you would like a slimmer and healthier body with all the benefits of weight loss surgery without any of the risks that can be ... now one thing i have noticed is: in the sh404sef control panel, i have noticed that i have the following index.php?option=com_content&Itemid=190&id=153&lang=en&view=article the above is the original url from day one but then i have the one below which is not the original index.php?option=com_content&Itemid=190&catid=150&id=153&lang=en&view=article i keep deleting the above which is not the original but it keeps coming back and i have been told this could be the fault can anyone please help me with this and solve how to stop it from coming back so google shows the correct description please.
Technical SEO | | ClaireH-1848860 -
Issue: Duplicate Page Content
Hi All, I am getting warnings about duplicate page content. The pages are normally 'tag' pages. I have some blog posts tagged with multiple 'tags'. Does it really affect my site?. I am using wordpress and Yoast SEO plugin. Thanks
Technical SEO | | KLLC0 -
Duplicate Content
The crawl shows a lot of duplicate content on my site. Most of the urls its showing are categories and tags (wordpress). so what does this mean exactly? categories is too much like other categories? And how do i go about fixing this the best way. thanks
Technical SEO | | vansy0