PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are backlinks within duplicate content ignored or devalued?
From what I understand, Googles no longer has a "Duplicate Content Penalty" instead duplicate content simply isn't show in the search results. Does that mean that any links in the duplicate content are completely ignored, or devalued as far as the backlink profile of the site they are linking to? An example would be an article that might be published on two or three major industry websites. Are only the links from the first website GoogleBot discovers the article on counted or are all the links counted and you just won't see the article itself come up in search results for the second and third website?
Intermediate & Advanced SEO | | Consult19010 -
Duplicate Content That Isn't Duplicated
In Moz, I am receiving multiple messages saying that there is duplicate page content on my website. For example, these pages are being highlighted as duplicated: https://www.ohpopsi.com/photo-wallpaper/made-to-measure/pop-art-graffiti/farm-with-barn-and-animals-wall-mural-3824 and https://www.ohpopsi.com/photo-wallpaper/made-to-measure/animals-wildlife/little-elephants-garden-seamless-pattern-wall-mural-3614. As you can see, both pages are different products, therefore I can't apply a 301 redirect or canonical tag. What do you suggest?
Intermediate & Advanced SEO | | e3creative0 -
Internal Duplicate Content - Classifieds (Panda)
I've been wondering for a while now, how Google treats internal duplicate content within classified sites. It's quite a big issue, with customers creating their ads twice.. I'd guess to avoid the price of renewing, or perhaps to put themselves back to the top of the results. Out of 10,000 pages crawled and tested, 250 (2.5%) were duplicate adverts. Similarly, in terms of the search results pages, where the site structure allows the same advert(s) to appear under several unique URLs. A prime example would be in this example. Notice, on this page we have already filtered down to 1 result, but the left hand side filters all return that same 1 advert. Using tools like Siteliner and Moz Analytics just highlights these as urgent high priority issues, but I've always been sceptical. On a large scale, would this count as Panda food in your opinion, or does Google understand the nature of classifieds is different, and treat it as such? Appreciate thoughts. Thanks.
Intermediate & Advanced SEO | | Sayers1 -
SEO effect of content duplication across hub of sites
Hello, I have a question about a website I have been asked to work on. It is for a real estate company which is part of a larger company. Along with several other (rival) companies it has a website of property listings which receives a feed of properties from a central hub site - so lots of potential for page, title and meta content duplication (if if isn't already occuring) across the whole network of sites. In early investigation I don't see any of these sites ranking very well at all in Google for expected search phrases. Before I start working on things that might improve their rankings, I wanted to ask some questions from you guys: 1. How would such duplication (if it is occuring) effect the SEO rankings of such sites individually, or the whole network/hub collectively? 2. Is it possible to tell if such a site has been "burnt" for SEO purposes, especially if or from any duplication? 3. If such a site or the network has been totally burnt, are there any approaches or remedies that can be made to improve the site's SEO rankings significantly, or is the only/best option to start again from scratch with a brand new site, ensuring the use of new meta descriptions and unique content? Thanks in advance, Graham
Intermediate & Advanced SEO | | gmwhite9991 -
Ticket Industry E-commerce Duplicate Content Question
Hey everyone, How goes it? I've got a bunch of duplicate content issues flagged in my Moz report and I can't figure out why. We're a ticketing site and the pages that are causing the duplicate content are for events that we no longer offer tickets to, but that we will eventually offer tickets to again. Check these examples out: http://www.charged.fm/mlb-all-star-game-tickets http://www.charged.fm/fiba-world-championship-tickets I realize the content is thin and that these pages basically the same, but I understood that since the Title tags are different that they shouldn't appear to the Goog as duplicate content. Could anyone offer me some insight or solutions to this? Should they be noindexed while the events aren't active? Thanks
Intermediate & Advanced SEO | | keL.A.xT.o1 -
Avoiding Duplicate Content - Same Product Under Different Categories
Hello, I am taking some past advise and cleaning up my content navigation to only show 7 tabs as opposed to the 14 currently showing at www.enchantingquotes.com. I am creating a "Shop by Room" and "Shop by Category" link, which is what my main competitors do. My concern is the duplicate content that will happen since the same item will appear in both categories. Should I no follow the "Shop by Room" page? I am confused as to when I should use a no follow as opposed to a canonical tag? Thank you so much for any advise!
Intermediate & Advanced SEO | | Lepasti0 -
Is legacy duplicate content an issue?
I am looking for some proof, or at least evidence to whether or not sites are being hurt by duplicate content. The situation is, that there were 4 content rich newspaper/magazine style sites that were basically just reskins of each other. [ a tactic used under a previous regime 😉 ] The least busy of the sites has since been discontinued & 301d to one of the others, but the traffic was so low on the discontinued site as to be lost in noise, so it is unclear if that was any benefit. Now for the last ~2 years all the sites have had unique content going up, but there are still the archives of articles that are on all 3 remaining sites, now I would like to know whether to redirect, remove or rewrite the content, but it is a big decision - the number of duplicate articles? 263,114 ! Is there a chance this is hurting one or more of the sites? Is there anyway to prove it, short of actually doing the work?
Intermediate & Advanced SEO | | Fammy0 -
Wordpress Duplicate Content Due To Allocating Two Post Categories
It looks like google has done a pretty deep crawl of my site and is now showing around 40 duplicate content issues for posts that I have tagged in two seperate categories for example: http://www.musicliveuk.com/latest-news/live-music-boosts-australian-economy http://www.musicliveuk.com/live-music/live-music-boosts-australian-economy I use the all in one SEO pack and have checked the no index for categories, archive, and tag archive boxes so google shouldn't even crawl this content should it? . I guess the obvious answer is to only put each post in one category but I shouldn't have to should I? Some posts are relevant in more than once category.
Intermediate & Advanced SEO | | SamCUK0