PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No-index pages with duplicate content?
Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help!
Intermediate & Advanced SEO | | EndeR-0 -
Duplicate Content www vs. non-www and best practices
I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC]
Intermediate & Advanced SEO | | EnvoyWeb
RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
-----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0 -
Does having a page that ends with ? cause duplicate content?
I am working on a site that has lots of dynamic parameters. So lets say we have www.example.com/page?parameter=1 When the page has no parameters you can still end up at www.example.com/page? Should I redirect this to www.example.com/page/ ? Im not sure if Google ignores this, or if these pages need to be dealt with. Thanks
Intermediate & Advanced SEO | | MarloSchneider0 -
Dynamic 301's causing duplicate content
Hi, wonder if anyone can help? We have just changed our site which was hosted on IIS and the page url's were like this ( example.co.uk/Default.aspx?pagename=About-Us ). The new page url is example.co.uk/About-Us/ and is using Apache. The 301's our developer told us to use was in this format: RewriteCond %{REQUEST_URI} ^/Default.aspx$
Intermediate & Advanced SEO | | GoGroup51
RewriteCond %{QUERY_STRING} ^pagename=About-Us$
RewriteRule ^(.*)$ http://www.domain.co.uk/About-Us/ [R=301,L] This seemed to work from a 301 point of view; however it also seemed to allow both of the below URL's to give the same page! example.co.uk/About-Us/?pagename=About-Us example.co.uk/About-Us/ Webmaster Tools has now picked up on this and is seeing it a duplicate content. Can anyone help why it would be doing this please. I'm not totally clued up and our host/ developer cant understand it too. Many Thanks0 -
Sanitary ware and accessories site with 1000s duplicate product titles.
sanitary ware and accessories site with 1000s duplicate product titles.as it in title.there are around 1400 links index in Google and many of them duplicate title tags duplicate meta description .many of them are just a products pages .i am looking a solution for making these correct .site equipped with CMS also.but there are no facility to enter meta details except just a tile and price. What can i do to solve this problem ? If i no index products pages will it effect to SERP ? Is it ok to create separate landing pages for each kews using HTML static pages like www.mysite.com/sanitary ware.html expert advices need <colgroup><col width="166"></colgroup>
Intermediate & Advanced SEO | | innofidelity
| sanitary ware |0 -
Wordpress Duplicate Content Due To Allocating Two Post Categories
It looks like google has done a pretty deep crawl of my site and is now showing around 40 duplicate content issues for posts that I have tagged in two seperate categories for example: http://www.musicliveuk.com/latest-news/live-music-boosts-australian-economy http://www.musicliveuk.com/live-music/live-music-boosts-australian-economy I use the all in one SEO pack and have checked the no index for categories, archive, and tag archive boxes so google shouldn't even crawl this content should it? . I guess the obvious answer is to only put each post in one category but I shouldn't have to should I? Some posts are relevant in more than once category.
Intermediate & Advanced SEO | | SamCUK0 -
Concerns about duplicate content issues with australian and us version of website
My company has an ecommerce website that's been online for about 5 years. The url is www.betterbraces.com. We're getting ready to launch an australian version of the website and the url will be www.betterbraces.com.au. The australian website will have the same look as the US website and will contain about 200 of the same products that are featured on the US website. The only major difference between the two websites is the price that is charged for the products. The australian website will be hosted on the same server as the US website. To ensure Australians don't purchase from the US site we are going to have a geo redirect in place that sends anyone with a AU ip address to the australian website. I am concerned that the australian website is going to have duplicate content issues. However, I'm not sure if the fact that the domains are so similar coupled with the redirect will help the search engines understand that these sites are related. I would appreciate any recommendations on how to handle this situation to ensure oue rankings in the search engines aren't penalized. Thanks in advance for your help. Alison French
Intermediate & Advanced SEO | | djo-2836690 -
Should I do something about this duplicate content? If so, what?
On our real estate site we have our office listings displayed. The listings are generated from a scraping script that I wrote. As such, all of our listings have the exact same description snippet as every other agent in our office. The rest of the page consists of site-wide sidebars and a contact form. The title of the page is the address of the house and so is the H1 tag. Manually changing the descriptions is not an option. Do you think it would help to have some randomly generated stuff on the page such as "similar listings"? Any other ideas? Thanks!
Intermediate & Advanced SEO | | MarieHaynes0