Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Chain
This is quite advanced so maybe Rand can give me an answer? I often have seen questions surrounding a 301 chain where only 85% of the link juice is passed on to the first target and 85% of that to the next one, up to three targets. But how about a canonical chain? What do I mean by this:? I have a client who sells lighting so I will use a real example (sans domain) I don't want 'new-product' pages appearing in SERPS. They dilute link equity for the categories they replicate and often contain identical products to the main categories and subcategories. I don't want to no index them all together I'd rather tell Google they are the same as the higher category/sub category. (discussion whether a noindex/follow tag would be better?) If I canonicalize new-products/ceiling-lights-c1/kitchen-lighting-c17/kitchen-ceiling-lights-c217 to /ceiling-lights-c1/kitchen-lighting-c17/kitchen-ceiling-lights-c217 I then subsequently discover that everything in kitchen-ceiling-lights-c217 is already in /kitchen-lighting-c17 and I decide to canonicalize those two - so I place a /kitchen-lighting-c17 canonical on /kitchen-ceiling-lights-c217. Then what happens to the new-products canonical? Is it the same rule - does it pass 85% of link equity back to the non new-product URL and 85% of that back to the category? does it just not work? or should I do noindexi/follow Now before you jump in: Let's assume these are done over a period of time because the obvious answer is: Canonicalize both back to /ceiling-lights-c1/kitchen-lighting-c17 I know that and that is not what I am asking. What if they are done in a sequence what is the real result? I don't want to patronise anyone but please read this carefully before giving an answer. Regards Nigel Carousel Projects.
Intermediate & Advanced SEO | | Nigel_Carr0 -
Can I use duplicate content in different US cities without hurting SEO?
So, I have major concerns with this plan. My company has hundreds of facilities located all over the country. Each facility has it's own website. We have a third party company working to build a content strategy for us. What they came up with is to create a bank of content specific to each service line. If/when any facility offers that service, they then upload the content for that service line to that facility website. So in theory, you might have 10-12 websites all in different cities, with the same content for a service. They claim "Google is smart, it knows its content all from the same company, and because it's in different local markets, it will still rank." My contention is that duplicate content is duplicate content, and unless it is "localize" it, Google is going to prioritize one page of it and the rest will get very little exposure in the rankings no matter where you are. I could be wrong, but I want to be sure we aren't shooting ourselves in the foot with this strategy, because it is a major major undertaking and too important to go off in the wrong direction. SEO Experts, your help is genuinely appreciated!
Intermediate & Advanced SEO | | MJTrevens1 -
Rel=canonical and internal links
Hi Mozzers, I was musing about rel=canonical this morning and it occurred to me that I didnt have a good answer to the following question: How does applying a rel=canonical on page A referencing page B as the canonical version affect the treatment of the links on page A? I am thinking of whether those links would get counted twice, or in the case of ver-near-duplicates which may have an extra sentence which includes an extra link, whther that extra link would count towards the internal link graph or not. I suspect that google would basically ignore all the content on page A and only look to page B taking into account only page Bs links. Any thoughts? Thanks!
Intermediate & Advanced SEO | | unirmk0 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Tabs and duplicate content?
We own this site http://www.discountstickerprinting.co.uk/ and just a little concerned as I right clicked open in new tab on the tab content section and it went to a new page For example if you right click on the price tab and click open in new tab you will end up with the url
Intermediate & Advanced SEO | | BobAnderson
http://www.discountstickerprinting.co.uk/#tabThree Does this mean that our content is being duplicated onto another page? If so what should I do?0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Rel=canonical tag on original page?
Afternoon All,
Intermediate & Advanced SEO | | Jellyfish-Agency
We are using Concrete5 as our CMS system, we are due to change but for the moment we have to play with what we have got. Part of the C5 system allows us to attribute our main page into other categories, via a page alaiser add-on. But what it also does is create several url paths and duplicate pages depending on how many times we take the original page and reference it in other categories. We have tried C5 canonical/SEO add-on's but they all seem to fall short. We have tried to address this issue in the most efficient way possible by using the rel=canonical tag. The only issue is the limitations of our cms system. We add the canonical tag to the original page header and this will automatically place this tag on all the duplicate pages and in turn fix the problem of duplicate content. The only problem is the canonical tag is on the original page as well, but it is referencing itself, effectively creating a tagging circle. Does anyone foresee a problem with the canonical tag being on the original page but in turn referencing itself? What we have done is try to simplify our duplicate content issues. We have over 2500 duplicate page issues because of this aliasing add-on and want to automate the canonical tag addition, rather than go to each individual page and manually add this tag, so the original reference page can remain the original. We have implemented this tag on one page at the moment with 9 duplicate pages/url's and are monitoring, but was curious if people had experienced this before or had any thoughts?0