PDF for link building - avoiding duplicate content
-
Hello,
We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product.
We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful.
My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content?
Thanks.
-
Hey Bob
I think you should forget about any kind of perceived conventions and have whatever you think works best for your users and goals.
Again, look at unbounce, that is a custom landing page with a homepage link (to share the love) but not the general site navigation.
They also have a footer to do a bit more link love but really, do what works for you.
Forget conventions - do what works!
Hope that helps
Marcus -
I see, thanks! I think it's important not to have the ecommerce navigation on the page promoting the pdf. What would you say is ideal as far as the graphical and navigation components of the page with the PDF on it - what kind of navigation and graphical header should I have on it?
-
Yep, check the HTTP headers with webbug or there are a bunch of browser plugins that will let you see the headers for the document.
That said, I would push to drive the links to the page though rather than the document itself and just create a nice page that houses the document and make that the link target.
You could even make the PDF link only available by email once they have singed up or some such as canonical is only a directive and you would still be better getting those links flooding into a real page on the site.
You could even offer up some HTML to make this easier for folks to link to that linked to your main page. If you take a look at any savvy infographics etc folks will try to draw a link into a page rather than the image itself for the very same reasons.
If you look at something like the Noobs Guide to Online Marketing from Unbounce then you will see something like this as the suggested linking code:
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[
](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
Unbounce – The DIY Landing Page Platform
So, the image is there but the link they are pimping is a standard page:
http://unbounce.com/noob-guide-to-online-marketing-infographic/
They also cheekily add an extra homepage link in as well with some keywords and the brand so if folks don't remove that they still get that benefit.
Ultimately, it means that when links flood into the site they benefit the whole site rather than just promote one PDF.
Just my tuppence!
Marcus -
Thanks for the code Marcus.
Actually, the pdf is what people will be linking to. It's a guide for websites. I think the PDF will be much easier to promote than the article.I assume so anyway.
Is there a way to make sure my canonical code in htaccess is working after I insert the code?
Thanks again,
Bob
-
Hey Bob
There is a much easier way to do this and simply have your PDFs that you don't want indexed in a folder that you block access to in robots.txt. This way you can just drop PDFs into articles and link to them knowing full well these pages will not be indexed.
Assuming you had a PDF called article.pdf in a folder called pdfs/ then the following would prevent indexation.
User-agent: * Disallow: /pdfs/
Or to just block the file itself:
User-agent: *
Disallow: /pdfs/yourfile.pdf Additionally, There is no reason not to add the canonical link as well and if you find people are linking directly to the PDF then having this would ensure that the equity associated with those links was correctly attributed to the parent page (always a good thing).Header add Link '<http: www.url.co.uk="" pdfs="" article.html="">; </http:> rel="canonical"'
Generally, there are better ways to block indexation than with robots.txt but in the case of PDFs, we really don't want these files indexed as they make for such poor landing pages (no navigation) and we certainly want to remove any competition or duplication between the page and the PDF so in this case, it makes for a quick, painless and suitable solution.
Hope that helps!
Marcus -
Thanks ThompsonPaul,
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.html
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.html="">; rel="canonical""</http:>
-
You can insert the canonical header link using your site's .htaccess file, Bob. I'm sure Hostgator provides access to the htaccess file through ftp (sometimes you have to turn on "show hidden files") or through the file manager built into your cPanel.
Check tip #2 in this recent SEOMoz blog article for specifics:
seomoz.org/blog/htaccess-file-snippets-for-seosJust remember too - you will want to do the same kind of on-page optimization for the PDF as you do for regular pages.
- Give it a good, descriptive, keyword-appropriate, dash-separated file name. (essential for usability as well, since it will become the title of the icon when saved to someone's desktop)
- Fill out the metadata for the PDF, especially the Title and Description. In Acrobat it's under File -> Properties -> Description tab (to get the meta-description itself, you'll need to click on the Additional Metadata button)
I'd be tempted to build the links to the html page as much as possible as those will directly help ranking, unlike the PDF's inbound links which will have to pass their link juice through the canonical, assuming you're using it. Plus, the visitor will get a preview of the PDF's content and context from the rest of your site which which may increase trust and engender further engagement..
Your comment about links in the PDF got kind of muddled, but you'll definitely want to make certain there are good links and calls to action back to your website within the PDF - preferably on each page. Otherwise there's no clear "next step" for users reading the PDF back to a purchase on your site. Make sure to put Analytics tracking tags on these links so you can assess the value of traffic generated back from the PDF - otherwise the traffic will just appear as Direct in your Analytics.
Hope that all helps;
Paul
-
Can I just use htaccess?
See here: http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers
We only have one pdf like this right now and we plan to have no more than five.
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.pdf
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.pdf="">; rel="canonical""</http:>
-
How do I know if I can do an HTTP header request? I'm using shared hosting through hostgator.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
Thank you DoRM,
I assume that the PDF is what I want to be the main version since that is what I'll be marketing, but I could be wrong? What if I get backlinks to both pages, will both sets of backlinks count?
-
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
You can read more her http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content on product pages
Hi, We are considering the impact when you want to deliver content directly on the product pages. If the products were manufactured in a specific way and its the same process across 100 other products you might want to tell your readers about it. If you were to believe the product page was the best place to deliver this information for your readers then you could potentially be creating mass content duplication. Especially as the storytelling of the product could equate to 60% of the page content this could really flag as duplication. Our options would appear to be:1. Instead add the content as a link on each product page to one centralised URL and risk taking users away from the product page (not going to help with conversion rate or designers plans)2. Put the content behind some javascript which requires interaction hopefully deterring the search engine from crawling the content (doesn't fit the designers plans & users have to interact which is a big ask)3. Assign one product as a canonical and risk the other products not appearing in search for relevant searches4. Leave the copy as crawlable and risk being marked down or de-indexed for duplicated contentIts seems the search engines do not offer a way for us to serve this great content to our readers with out being at risk of going against guidelines or the search engines not being able to crawl it.How would you suggest a site should go about this for optimal results?
Intermediate & Advanced SEO | | FashionLux2 -
Will merging sites create a duplicate content penalty?
I have 2 sites that would be better suited being merged and creating a more authoritative site. Basically I'de like to merge site A in to site B. If I add new pages from site A to Site B and create 301 redirects for those pages on site A to the new pages on Site B is that the best way to go about it? As the pages are already indexed would this create any duplicate content issue or would the redirect solve this?
Intermediate & Advanced SEO | | boballanjones0 -
Scraping / Duplicate Content Question
Hi All, I understanding the way to protect content such as a feature rich article is to create authorship by linking to your Google+ account. My Question
Intermediate & Advanced SEO | | Mark_Ch
You have created a webpage that is informative but not worthy to be an article, hence no need create authorship in Google+
If a competitor comes along and steals this content word for word, something similar, creates their own Google+ page, can you be penalised? Is there any way to protect yourself without authorship and Google+? Regards Mark0 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Can I duplicate my websites content on Ebay Store?
Our company is setting up a store on Ebay. Is it okay to duplicate our content descriptions on our ebay store with a link going back to our website? Or would this potentially hurt us in Search?
Intermediate & Advanced SEO | | hfranz0 -
Daily Link Building tactics to move the needle
I know most of you may frown upon this question, but to those of you who are still going after blog commenting, forum posting, Q&A sites (even if that means you're getting nofollow links), do you have any recommendations on a guide/blog post that describes how to create a daily "low level" link building program to supplement the higher level, relationship dependent link building that you're already doing? Thanks in advance!
Intermediate & Advanced SEO | | pbhatt0 -
How to avoid too many "On Page Links"?
Hi everyone I don't seem to be able to keep big G off my back, even though I do not engage in any black hat or excessive optimization practices. Due to another unpleasant heavy SERP "fluctuation" I am in investigation mode yet again and want to take a closer look at one of the warnings within the SEOmoz dashboard, which is "Too many on page links". Looking at my statistics this is clearly the case. I wonder how you can even avoid that at times. I have a lot of information on my homepage that links out to subpages. I get the feeling that even the links within the roll-over menus (or dropdown) are counted. Of course, in that case then you will end up with a crazy amount of on page links. What about blog-like news entries on your homepage that link to other pages as well? And not to forget the links that result from the tags underneath a post? What am I trying to get at? Well, do you feel that a bad website template may cause this issue i.e. are the links from roll-over menus counted as links on the homepage even though they are not directly visible? I am not sure how to cut down on the issue as the sidebar modules are present on every page and thus up the links count wherever you are on the site. On another note, I've seen plenty of homepages with excessive information and links going out, would they be suffering from the search engines' hammer too? How do you manage the too many on page links issue? Many thanks for your input!
Intermediate & Advanced SEO | | Hermski0 -
What is a good, structured link building campaign?
I feel like I've tried everything up to this point. I also survived all of the recent Google updates, including the recent EMD update (my site is an EMD). Also I am in the financial sector. I blog every day, I've made some great infographics. I have a very nice website (much better than the competition's), I've done my on-page SEO. I've done the basic link building too. I've hit good business directories, good blog directories (like Technorati), and infographic directories. I've done a bit of comment linkbuilding but I don't see that being very fruitful. I also have a large twitter following + a twitter tribe that shares all of my blog posts. I can't say I've managed to build any sort of community however. My traffic has grown from 0 to anywhere from 80 to 160 visitors every day in just 3 months, and I am happy with that progress but it seems like I have plateaued. Sure I will continue creating content, but what can I do about link building? That's where the results are probably going to come from. I've read all the articles about it, took advice from LinkBuildingSchool.com... but at this point, I'm not sure how to continue. I dont want to continue going after blog comment links, I want quality. Any advice?
Intermediate & Advanced SEO | | MangoMan160