PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content in external domains
Hi,
Intermediate & Advanced SEO | | teconsite
I have been asking about this case before, but now my question is different.
We have a new school that offers courses and programs . Its website is quite new (just a five months old) It is very common between these schools to publish the courses and programs in training portals to promote those courses and to increase the visibility of them. As the website is really new, I found when I was doing the technical audit, that when I googled a text snipped from the site, the new school website was being omitted, and instead, the course portals are being shown. Of course, I know that the best recommendation would be to create a different content for that purpose, but I would like to explore if there is more options. Most of those portals doesn't allow to place a link to the website in the content and not to mention canonical. Of course most of them are older than the new website and their authority is higher. so,... with this situation, I think the only solution is to create a different content for the website and for the portals.
I was thinking that maybe, If we create the content first in the new website, send it to the index, and wait for google to index it, and then send the content to the portals, maybe we would have more opportunites to not be ommited by Google in search results. What do you think? Thank you!0 -
How do I geo-target continents & avoid duplicate content?
Hi everyone, We have a website which will have content tailored for a few locations: USA: www.site.com
Intermediate & Advanced SEO | | AxialDev
Europe EN: www.site.com/eu
Canada FR: www.site.com/fr-ca Link hreflang and the GWT option are designed for countries. I expect a fair amount of duplicate content; the only differences will be in product selection and prices. What are my options to tell Google that it should serve www.site.com/eu in Europe instead of www.site.com? We are not targeting a particular country on that continent. Thanks!0 -
Reinforcing Rel Canonical? (Fixing Duplicate Content)
Hi Mozzers, We're having trouble with duplicate content between two sites, so we're looking to add some oomph to the rel canonical link elements we put on one of our sites pointing towards the other to help speed up the process and give Google a bigger hint. Would adding a hyperlink on the "copying" website pointing towards the "original" website speed this process up? Would we get in trouble if added about 80,000 links (1 on each product page) with a link to the matching product on the other site? For example, we could use text like "Buy XY product on Other Brand Name and receive 10% off!"
Intermediate & Advanced SEO | | Travis-W0 -
Issue with duplicate content in blog
I have blog where all the pages r get indexed, with rich content in it. But In blogs tag and category url are also get indexed. i have just added my blog in seomoz pro, and i have checked my Crawl Diagnostics Summary in that its showing me that some of your blog content are same. For Example: www.abcdef.com/watches/cool-watches-of-2012/ these url is already get indexed, but i have asigned some tag and catgeory fo these url also which have also get indexed with the same content. so how shall i stop search engines to do not crawl these tag and categories pages. if i have more no - follow tags in my blog does it gives negative impact to search engines, any alternate way to tell search engines to stop crawling these category and tag pages.
Intermediate & Advanced SEO | | sumit600 -
How To Handle Duplicate Content Regarding A Corp With Multiple Sites and Locations?
I have a client that has 800 locations. 50 of them are mine. The corporation has a standard website for their locations. The only thing different is their location info on each page. The majority of the content is the same for each website for each location. What can be done to minimize the impact/penalty of having "duplicate or near duplicate" content on their sites? Assuming corporate won't allow the pages to be altered.
Intermediate & Advanced SEO | | JChronicle0 -
Duplicate Content Issue
Why do URL with .html or index.php at the end are annoying to the search engine? I heard it can create some duplicate content but I have no idea why? Could someone explain me why is that so? Thank you
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Why duplicate content for same page?
Hi, My SEOMOZ crawl diagnostic warn me about duplicate content. However, to me the content is not duplicated. For instance it would give me something like: (URLs/Internal Links/External Links/Page Authority/Linking Root Domains) http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110516 /1/1/31/2 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110711 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110811 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110911 0/0/1/0 Why is this seen as duplicate content when it is only URL with campaign tracking codes to the same content? Do I need to clean this?Thanks for answer
Intermediate & Advanced SEO | | nuxeo0 -
Duplicate Content across 4 domains
I am working on a new project where the client has 5 domains each with identical website content. There is no rel=canonical. There is a great variation in the number of pages in the index for each of the domains (from 1 to 1250). OSE shows a range of linking domains from 1 to 120 for each domain. I will be strongly recommending to the client to focus on one website and 301 everything from the other domains. I would recommend focusing on the domain that has the most pages indexed and the most referring domains but I've noticed the client has started using one of the other domains in their offline promotional activity and it is now their preferred domain. What are your thoughts on this situation? Would it be better to 301 to the client's preferred domain (and lose a level of ranking power throught the 301 reduction factor + wait for other pages to get indexed) or stick with the highest ranking/most linked domain even though it doesn't match the client's preferred domain used for email addresses etc. Or would it better to use cross-domain canoncial tags? Thanks
Intermediate & Advanced SEO | | bjalc20110