Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Photogallery Metadescription - Duplicate or Unique?
Hi There,Got a few question regarding SEO for Photogalleries. - What are the best practices in terms of SEO whenever we have a gallery of pictures and each picture has a different URL? Currently, we are using a unique metadescription that is on the first page of the gallery, then for the remaining ones, we are using the captions of the pictures as metadescriptions.Is the the good way to go? Is it better to have short metadescriptions (being the captions of our pictures) or duplicate metadescriptions (the same one across the pictures)?- is there any other recommendations used by other websites?Thanks!
Intermediate & Advanced SEO | | seaounousa0 -
With or without the "www." ?
Is there any benefit whatsoever to having the www. in the URL?
Intermediate & Advanced SEO | | JordanBrown0 -
Using Canonical on home page
Our home page has the canonical tag pointing to itself (something from wordpress i understand). Is there any positive or negative affect that anyone is aware of from having pages canonical'ed to themselves?
Intermediate & Advanced SEO | | halloranc0 -
Canonical Tag for Pages with Less Content
I am considering using a cross-domain canonical tag for pages that are very similar but one has less content than the other. The domains are geo specific, so for example. www.page.com - with content xxx, yyy, zzz, and www.page.fr with content xxx is this a problem because while there is clearly duplicate content here the pages are not actually significantly similar since there is so much less content on one page than the other?
Intermediate & Advanced SEO | | theLotter0 -
Duplicate content in Webmaster tools, is this bad?
We launched a new site, and we did a 301 redirect to every page. I have over 5k duplicate meta tags and title tags. It shows the old page and the new page as having the same title tag and meta description. This isn't true, we changed the titles and meta description, but it still shows up like that. What would cause that?
Intermediate & Advanced SEO | | EcommerceSite0 -
Rel=canonical tag on original page?
Afternoon All,
Intermediate & Advanced SEO | | Jellyfish-Agency
We are using Concrete5 as our CMS system, we are due to change but for the moment we have to play with what we have got. Part of the C5 system allows us to attribute our main page into other categories, via a page alaiser add-on. But what it also does is create several url paths and duplicate pages depending on how many times we take the original page and reference it in other categories. We have tried C5 canonical/SEO add-on's but they all seem to fall short. We have tried to address this issue in the most efficient way possible by using the rel=canonical tag. The only issue is the limitations of our cms system. We add the canonical tag to the original page header and this will automatically place this tag on all the duplicate pages and in turn fix the problem of duplicate content. The only problem is the canonical tag is on the original page as well, but it is referencing itself, effectively creating a tagging circle. Does anyone foresee a problem with the canonical tag being on the original page but in turn referencing itself? What we have done is try to simplify our duplicate content issues. We have over 2500 duplicate page issues because of this aliasing add-on and want to automate the canonical tag addition, rather than go to each individual page and manually add this tag, so the original reference page can remain the original. We have implemented this tag on one page at the moment with 9 duplicate pages/url's and are monitoring, but was curious if people had experienced this before or had any thoughts?0 -
Duplicate content for swatches
My site is showing a lot of duplicate content on SEOmoz. I have discovered it is because the site has a lot of swatches (colors for laminate) within iframes. Those iframes have all the same content except for the actual swatch image and the title of the swatch. For example, these are two of the links that are showing up with duplicate content: http://www.formica.com/en/home/dna.aspx?color=3691&std=1&prl=PRL_LAMINATE&mc=0&sp=0&ots=&fns=&grs= http://www.formica.com/en/home/dna.aspx?color=204&std=1&prl=PRL_LAMINATE&mc=0&sp=0&ots=&fns=&grs= I do want each individual swatch to show up in search results and they currently are if you search for the exact swatch name. Is the fact that they all have duplicate content affecting my individual rankings and my domain authority? What can I do about it? I can't really afford to put unique content on each swatch page so is there another way to get around it? Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0 -
301 redirect for duplicate content
Hey, I have just started working on a site which is a video based city guide, with promotional videos for restaurants, bars, activities,etc. The first thing that I have noticed is that every video on the site has two possible urls:- http://www.domain.com/venue.php?url=rosemarino
Intermediate & Advanced SEO | | AdeLewis
http://www.domain.com/venue/rosemarino I know that I can write a .htaccess line to redirect one to the other:- redirect 301 /venue.php?url=rosemarino http://www.domain.com/venue/rosemarino but this would involve creating a .htaccess line for every video on the site and new videos that get added may get missed. Does anyone know a way of creating a rule to rewrite these urls? Any help would be most gratefully received. Thanks. Ade.0