Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Pages. Can I change it to do this without getting penalized ? I want to lower our bounce rate from these pages to encourage the user to continue on the site
Hi All, We have been streaming our site and got rid of thousands of pages for redundant locations (Basically these used to be virtual locations where we didn't have a depot although we did deliver there and most of them was duplicate/thin content etc ). Most of them have little if any link value and I didn't want to 301 all of them as we already have quite a few 301's already We currently display a 404 page but I want to improve on this. Current 404 page is - http://goo.gl/rFRNMt I can get my developer to change it, so it will still be a 404 page but the user will see the relevant category page instead ? So it will look like this - http://goo.gl/Rc8YP8 . We could also use Java script to show the location name etc... Would be be okay ? or would google see this as cheating. basically I want to lower our bounce rates from these pages but still be attractive enough for the user to continue in the site and not go away. If this is not a good idea, then any recommendations on improving our current 404 would be greatly appreciated. thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Confusion about forums and canonical links
Like many people, I get a lot of alerts about duplicate content, etc. I also don't know if I am hurting my domain authority because of the forum. It is a pretty active forum, so it is important to the site. So my question is, right now there could be 50 pages like this <domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/
Intermediate & Advanced SEO | | BrickPicker
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-1
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-2
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-3
all the way to:
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-50</domain></domain></domain></domain></domain> So right now the rel canonical links are set up just like above, including the page numbers. I am not sure if that is the best way or not. I really thought that all the of links for that topic should be
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/ that way it would passing "juice" to the main topic/link. </domain> I do have other links setup for:
link rel='next',link rel='up',link rel='last' Overall is this correct, or is there a better way to do it?0 -
Rel=Canonical=CONFUSED
Hey, I am a confused canonical and here's why - please help! I have a master website called www.1099pro.com and then many other websites that simply duplicate the material on the master site (i.e www.1099A.com, www.1099T.com, www.1099solution.com, and the list goes on). These other domains & pages have been around for long enough that they have been able to garner some page authority & domain authority that it makes it worthwhile to redirect them to their corresponding pages on www.1099pro.com. The problem is two-fold when trying to pass this link-juice: I do not have access to the web-service that hosts the other sites/domains and cannot 301 redirect them The other sites/domains are setup so that whatever changes I make to www.1099pro.com are automatically distributed across all the other sites. This means that when I put on www.1099pro.com it also shows up on all the other domains. It is my understanding that having on a site such as www.1099solution.com does not pass any link juice and actually eliminates that page from the search results. Is there any way that I can pass the link juice?
Intermediate & Advanced SEO | | Stew2220 -
Duplicate content on subdomains.
Hi Mozer's, I have a site www.xyz.com and also geo targeted sub domains www.uk.xyz.com, www.india.xyz.com and so on. All the sub domains have the content which is same as the content on the main domain that is www.xyz.com. So, I want to know how can i avoid content duplication. Many Thanks!
Intermediate & Advanced SEO | | HiteshBharucha0 -
Http and https duplicate content?
Hello, This is a quick one or two. 🙂 If I have a page accessible on http and https count as duplicate content? What about external links pointing to my website to the http or https page. Regards, Cornel
Intermediate & Advanced SEO | | Cornel_Ilea0 -
Duplicate page content and Duplicate page title errors
Hi, I'm new to SeoMoz and to this forum. I've started a new campaign on my site and got back loads of error. Most of them are Duplicate page content and Duplicate page title errors. I know I have some duplicate titles but I don't have any duplicate content. I'm not a web developer and not so expert but I have the impression that the crawler is following all my internal links (Infact I have also plenty of warnings saying "Too many on-page links". Do you think this is the cause of my errors? Should I implement the nofollow on all internal links? I'm working with Joomla. Thanks a lot for your help Marco
Intermediate & Advanced SEO | | marcodublin0 -
Does rel canonical need to be absolute?
Hi guys and gals, Our CMS has just been updated to its latest version which finally adds support for rel=canonical. HUZZAH!!! However, it doesn't add the absolute URL of the page. There is a base ref tag which looks like <base <="" span="">href="http://shop.confetti.co.uk/" /> On a page such as http://shop.confetti.co.uk/branch/wedding-favours the canonical tag looks like rel="canonical" href="/branch/wedding-favours" /> Does Google recognise this as a legitimate canonical tag? The SEOmoz On-Page Report Card doesn't recognise it as such. Any help would be great, Thanks in advance, Brendan.
Intermediate & Advanced SEO | | Confetti_Wedding0 -
Diagnosing duplicate content issues
We recently made some updates to our site, one of which involved launching a bunch of new pages. Shortly afterwards we saw a significant drop in organic traffic. Some of the new pages list similar content as previously existed on our site, but in different orders. So our question is, what's the best way to diagnose whether this was the cause of our ranking drop? My current thought is to block the new directories via robots.txt for a couple days and see if traffic improves. Is this a good approach? Any other suggestions?
Intermediate & Advanced SEO | | jamesti0