Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Issue On AMP
Hi everyone,
Intermediate & Advanced SEO | | MuhammadQasimAttari
I have one issue about canonical. kindly guide me about it. I have a site example.com/abc and I convert it on an amp and know its URLs is example.com/abc=?amp. but the search console tells me to add the proper canonical URL but both pages are the same. kindly guide me about it. what will I do?0 -
Big retailers and duplicate content
Hello there! I was wondering if you guys have experience with big retailers sites fetching data via API (PDP content etc.) from another domain which is also sharing the same data with other multiple sites. If each retailer has thousands on products, optimizing PDP content (even in batches) is quite of a cumbersome task and rel="canonical" pointing to original domain will dilute the value. How would you approach this type of scenario? Looking forward to read your suggestions/experiences Thanks a lot! Best Sara
Intermediate & Advanced SEO | | SaraCoppola1 -
Competition site with Duplicate Name
I have a client who is in real estate and one of his competition used the same Name and Domain and added the word Toronto in front. The competition is putting similar content on the website. The competition also hijacked the Google listing since they both work in the same building. Is this going to affect the ranking our website? What negative effect will it put on our website?
Intermediate & Advanced SEO | | KaranX0 -
Internal Duplicate Content Question...
We are looking for an internal duplicate content checker that is capable of crawling a site that has over 300,000 pages. We have looked over Moz's duplicate content tool and it seems like it is somewhat limited in how deep it crawls. Are there any suggestions on the best "internal" duplicate content checker that crawls deep in a site?
Intermediate & Advanced SEO | | tdawson091 -
Duplicate Content Pages - A Few Queries..
I am working through the latest Moz Crawl Report and focusing on the 'high priority' issues of Duplicate Page Content. There are some strange instances being flagged and so wondered whether anyone has any knowledge as to why this may be happening... Here is an example; This page; http://www.bolsovercruiseclub.com/destinations/cruise-breaks-&-british-isles/bruges/ ...is apparently duplicated with these pages; http://www.bolsovercruiseclub.com/guides/excursions http://www.bolsovercruiseclub.com/guides/cruises-from-the-uk http://www.bolsovercruiseclub.com/cruise-deals/norwegian-star-europe-cruise-deals Not sure why...? Also, pages that are on our 'Cruise Reviews' section such as this page; http://www.bolsovercruiseclub.com/cruise-reviews/p&o-cruises/adonia/cruising/931 ...are being flagged as duplicated content with a page like this; http://www.bolsovercruiseclub.com/destinations/cruise-breaks-&-british-isles/bilbao/ Is this a 'thin content' issue i.e. 2 pages have 'thin content' and are therefore duplicated? If so, the 'destinations' page can (and will be) rewritten with more content (and images) but the 'cruise reviews' are written by customers and so we are unable to do anything there... Hope that all makes sense?! Andy
Intermediate & Advanced SEO | | TomKing0 -
Duplicate keyphrases in page titles = penalty?
Hello Mozzers - just looking at a website which has duplicate keyphrases in its page titles... So you have [keyphrase 1] | [exact match Keyphrase 1] Now I happen to know this particular site has suffered a dramatic fall in traffic - the SEO agency working on the site had advised the client to duplicate keyphrases. Hard to believe, huh! What I'm wondering is whether this extensive exact match keyphrase duplication might've been enough to attract a penalty? Your thoughts would be welcome.
Intermediate & Advanced SEO | | McTaggart0 -
HTTPS in Rel Canonical
Hi, Should I, or do I need to, use HTTPS (note the "S") in my canonical tags? Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
Duplicate page content and duplicate pate title
Hi, i am running a global concept that operates with one webpage that has lot of content, the content is also available on different domains, but with in the same concept. I think i am getting bad ranking due to duplicate content, since some of the content is mirrored from the main page to the other "support pages" and they are almost 200 in total. Can i do some changes to work around this or am i just screwed 🙂
Intermediate & Advanced SEO | | smartmedia0