PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same content, different languages. Duplicate content issue? | international SEO
Hi, If the "content" is the same, but is written in different languages, will Google see the articles as duplicate content?
Intermediate & Advanced SEO | | chalet
If google won't see it as duplicate content. What is the profit of implementing the alternate lang tag?Kind regards,Jeroen0 -
Tools to scan entire site for duplicate content?
HI guys, Just wondering if anyone knows of any tools to scan a site for duplicate content (with other sites on the web). Looking to quickly identify product pages containing duplicate content/duplicate product descriptions for E-commerce based websites. I know copy scape can which can check up to 10,000 pages in a single operation with Batch Search. But just wondering if there is anything else on the market i should consider looking at? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
If a website trades internationally and simply translates its online content from English to French, German, etc how can we ensure no duplicate content penalisations and still maintain SEO performance in each territory?
Most of the international sites are as below: example.com example.de example.fr But some countries are on unique domains such example123.rsa
Intermediate & Advanced SEO | | Dave_Schulhof0 -
Duplicate content issue - online retail site.
Hello Mozzers, just looked at a website and just about every product page (there are hundreds - yikes!) is duplicated like this at end of each url (see below). Surely this is a serious case of duplicate content? Any idea why a web developer would do this? Thanks in advance! Luke prod=company-081
Intermediate & Advanced SEO | | McTaggart
prod=company-081&cat=20 -
Copying my Facebook content to website considered duplicate content?
I write career advice on Facebook on a daily basis. On my homepage users can see the most recent 4-5 feeds (using FB social media plugin). I am thinking to create a page on my website where visitors can see all my previous FB feeds. Would this be considered duplicate content if I copy paste the info, but if I use a Facebook social media plugin then it is not considered duplicate content? I am working on increasing content on my website and feel incorporating FB feeds would make sense. thank you
Intermediate & Advanced SEO | | knielsen0 -
Duplicate Content on Product Pages
I'm getting a lot of duplicate content errors on my ecommerce site www.outdoormegastore.co.uk mainly centered around product pages. The products are completely different in terms of the title, meta data, product descriptions and images (with alt tags)but SEOmoz is still identifying them as duplicates and we've noticed a significant drop in google ranking lately. Admittedly the product descriptions are a little bit thin but I don't understand why the pages would be viewed as duplicates and therefore can be ranked lower? The content is definitely unique too. As an example these three pages have been identified as being duplicates of each other. http://www.outdoormegastore.co.uk/regatta-landtrek-25l-rucksack.html http://www.outdoormegastore.co.uk/canyon-bryce-adult-cycling-helmet-9045.html http://www.outdoormegastore.co.uk/outwell-minnesota-6-carpet-for-green-07-08-tent.html
Intermediate & Advanced SEO | | gavinhoman0 -
HTTPS Duplicate Content?
I just recieved a error notification because our website is both http and https. http://www.quicklearn.com & https://www.quicklearn.com. My tech tells me that this isn't actually a problem? Is that true? If not, how can I address the duplicate content issue?
Intermediate & Advanced SEO | | QuickLearnTraining0 -
Nuanced duplicate content problem.
Hi guys, I am working on a recently rebuilt website, which has some duplicate content issues that are more nuanced than usual. I have a plan of action (which I will describe further), so please let me know if it's a valid plan or if I am missing something. Situation: The client is targeting two types of users: business leads (Type A) and potential employees (Type B), so for each of their 22 locations, they have 2 pages - one speaking to Type A and another to Type B. Type A location page contains a description of the location. In terms of importance, Type A location pages are secondary because to the Type A user, locations are not of primary importance. Type B location page contains the same description of the location plus additional lifestyle description. These pages carry more importance, since they are attempting to attract applicants to work in specific places. So I am planning to rank these pages eventually for a combination of Location Name + Keyword. Plan: New content is not an option at this point, so I am planning to set up canonical tags on both location Types and make Type B, the canonical URL, since it carries more importance and more SEO potential. The main nuance is that while Type A and Type B location pages contain some of the same content (about 75%-80%), they are not exactly the same. That is why I am not 100% sure that I should canonicalize them, but still most of the wording on the page is identical, so... Any professional opinion would be greatly appreciated. Thanks!
Intermediate & Advanced SEO | | naymark.biz0