undefined
Skip to content
Moz logo Menu open Menu close
  • Products
    • Moz Pro
    • Moz Pro Home
    • Moz Local
    • Moz Local Home
    • STAT
    • Moz API
    • Moz API Home
    • Compare SEO Products
    • Moz Data
  • Free SEO Tools
    • Domain Analysis
    • Keyword Explorer
    • Link Explorer
    • Competitive Research
    • MozBar
    • More Free SEO Tools
  • Learn SEO
    • Beginner's Guide to SEO
    • SEO Learning Center
    • Moz Academy
    • SEO Q&A
    • Webinars, Whitepapers, & Guides
  • Blog
  • Why Moz
    • Agency Solutions
    • Enterprise Solutions
    • Small Business Solutions
    • Case Studies
    • The Moz Story
    • New Releases
  • Log in
  • Log out
  • Products
    • Moz Pro

      Your all-in-one suite of SEO essentials.

    • Moz Local

      Raise your local SEO visibility with complete local SEO management.

    • STAT

      SERP tracking and analytics for enterprise SEO experts.

    • Moz API

      Power your SEO with our index of over 44 trillion links.

    • Compare SEO Products

      See which Moz SEO solution best meets your business needs.

    • Moz Data

      Power your SEO strategy & AI models with custom data solutions.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Free SEO Tools
    • Domain Analysis

      Get top competitive SEO metrics like DA, top pages and more.

    • Keyword Explorer

      Find traffic-driving keywords with our 1.25 billion+ keyword index.

    • Link Explorer

      Explore over 40 trillion links for powerful backlink data.

    • Competitive Research

      Uncover valuable insights on your organic search competitors.

    • MozBar

      See top SEO metrics for free as you browse the web.

    • More Free SEO Tools

      Explore all the free SEO tools Moz has to offer.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Learn SEO
    • Beginner's Guide to SEO

      The #1 most popular introduction to SEO, trusted by millions.

    • SEO Learning Center

      Broaden your knowledge with SEO resources for all skill levels.

    • On-Demand Webinars

      Learn modern SEO best practices from industry experts.

    • How-To Guides

      Step-by-step guides to search success from the authority on SEO.

    • Moz Academy

      Upskill and get certified with on-demand courses & certifications.

    • SEO Q&A

      Insights & discussions from an SEO community of 500,000+.

    Unlock flexible pricing & new endpoints
    Moz API

    Unlock flexible pricing & new endpoints

    Find your plan
  • Blog
  • Why Moz
    • Small Business Solutions

      Uncover insights to make smarter marketing decisions in less time.

    • Agency Solutions

      Earn & keep valuable clients with unparalleled data & insights.

    • Enterprise Solutions

      Gain a competitive edge in the ever-changing world of search.

    • The Moz Story

      Moz was the first & remains the most trusted SEO company.

    • Case Studies

      Explore how Moz drives ROI with a proven track record of success.

    • New Releases

      Get the scoop on the latest and greatest from Moz.

    Surface actionable competitive intel
    New Feature

    Surface actionable competitive intel

    Learn More
  • Log in
    • Moz Pro
    • Moz Local
    • Moz Local Dashboard
    • Moz API
    • Moz API Dashboard
    • Moz Academy
  • Avatar
    • Moz Home
    • Notifications
    • Account & Billing
    • Manage Users
    • Community Profile
    • My Q&A
    • My Videos
    • Log Out

The Moz Q&A Forum

  • Forum
  • Questions
  • Users
  • Ask the Community

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

  1. Home
  2. SEO Tactics
  3. Intermediate & Advanced SEO
  4. Can PDF be seen as duplicate content? If so, how to prevent it?

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

Can PDF be seen as duplicate content? If so, how to prevent it?

Intermediate & Advanced SEO
7
20
12.8k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as question
Log in to reply
This topic has been deleted. Only users with question management privileges can see it.
  • Gestisoft-Qc
    Gestisoft-Qc Subscriber last edited by Jan 23, 2012, 7:13 PM

    I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it.

    We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process.

    However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents.

    Any one has insight on how to deal with PDF provided by third parties?

    Thanks in advance.

    1 Reply Last reply Reply Quote 1
    • ilonka65
      ilonka65 last edited by Apr 10, 2015, 2:38 AM Apr 10, 2015, 2:38 AM

      It looks like Google is not crawling tabs anymore, therefore if your pdf's are tabbed within pages, it might not be an issue: https://www.seroundtable.com/google-hidden-tab-content-seo-19489.html

      1 Reply Last reply Reply Quote 0
      • topic:timeago_earlier,11 months
      • ASriv
        ASriv Subscriber last edited by May 1, 2014, 4:05 PM May 1, 2014, 4:05 PM

        Sure, I understand - thanks EGOL

        1 Reply Last reply Reply Quote 0
        • EGOL
          EGOL @ASriv last edited by May 1, 2014, 2:41 PM May 1, 2014, 2:41 PM

          I would like to give that to you but it is on a site that I don't share in forums.  Sorry.

          1 Reply Last reply Reply Quote 0
          • ASriv
            ASriv Subscriber last edited by May 1, 2014, 2:15 PM May 1, 2014, 2:15 PM

            Thanks EGOL

            That would be ideal.

            For a site that has multiple authors and with it being impractical to get a developer involved every time a web page / blog post and the pdf are created, is there a single line of code that could be used to accomplish this in .htaccess?

            If so, would you be able to show me an example please?

            EGOL 1 Reply Last reply May 1, 2014, 2:41 PM Reply Quote 0
            • EGOL
              EGOL last edited by May 1, 2014, 2:08 PM May 1, 2014, 2:08 PM

              I assigned rel=canonical to my PDFs using htaccess.

              Then, if anyone links to the PDFs the linkvalue gets passed to the webpage.

              1 Reply Last reply Reply Quote 0
              • ASriv
                ASriv Subscriber last edited by May 1, 2014, 2:04 PM May 1, 2014, 2:04 PM

                Hi all

                I've been discussing the topic of making content available as both blog posts and pdf downloads today.

                Given that there is a lot of uncertainty and complexity around this issue of potential duplication, my plan is to house all the pdfs in a folder that we block with robots.txt

                Anyone agree / disagree with this approach?

                1 Reply Last reply Reply Quote 0
                • topic:timeago_earlier,9 months
                • Dr-Pete
                  Dr-Pete Staff @ATMOSMarketing56 last edited by Aug 1, 2013, 6:54 PM Aug 1, 2013, 6:54 PM

                  Unfortunately, there's no great way to have it both ways. If you want these pages to get indexed for the links, then they're potential duplicates. If Google filters them out, the links probably won't count. Worst case, it could cause Panda-scale problems. Honestly, I suspect the link value is minimal and outweighed by the risk, but it depends quite a bit on the scope of what you're doing and the general link profile of the site.

                  1 Reply Last reply Reply Quote 0
                  • ATMOSMarketing56
                    ATMOSMarketing56 Subscriber last edited by Aug 1, 2013, 3:30 PM Aug 1, 2013, 3:30 PM

                    I think you can set it to public or private (logged-in only) and even put a price-tag on it if you want. So yes setting it to private would help to eliminate the dup content issue, but it would also hide the links that I'm using to link-build.

                    I would imagine that since this guide would link back to our original site that it would be no different than if someone were to copy the content from our site and link back to us with it, thus crediting us as the original source. Especially if we ensure to index it through GWMT before submitting to other platforms. Any good resources that delve into that?

                    Dr-Pete 1 Reply Last reply Aug 1, 2013, 6:54 PM Reply Quote 0
                    • Dr-Pete
                      Dr-Pete Staff last edited by Aug 1, 2013, 3:14 PM Aug 1, 2013, 3:14 PM

                      Potentially, but I'm honestly not sure how Scrid's pages are indexed. Don't you need to log in or something to actually see the content on Scribd?

                      1 Reply Last reply Reply Quote 0
                      • ATMOSMarketing56
                        ATMOSMarketing56 Subscriber last edited by Aug 1, 2013, 11:30 AM Aug 1, 2013, 11:30 AM

                        What about this instance:

                        (A) I made an "ultimate guide to X" and posted it on my site as individual HTML pages for each chapter

                        (B) I made a PDF version with the exact same content that people can download directly from the site

                        (C) I uploaded the PDF to sites like Scribd.com to help distribute it further, and build links with the links that are embedded in the PDF.

                        Would those all be dup content? Is (C) recommended or not?

                        1 Reply Last reply Reply Quote 0
                        • topic:timeago_earlier,2 years
                        • EGOL
                          EGOL @Gestisoft-Qc last edited by Jan 25, 2012, 11:39 PM Jan 25, 2012, 11:39 PM

                          Thanks!. I am going to look into this.  I'll let you know if I learn anything.

                          1 Reply Last reply Reply Quote 0
                          • Dr-Pete
                            Dr-Pete Staff @Gestisoft-Qc last edited by Jan 26, 2012, 12:54 PM Jan 25, 2012, 8:22 PM

                            If they duplicate your main content, I think the header-level canonical may be a good way to go. For the syndication scenario, it's tough, because then you're knocking those PDFs out of the rankings, potentially, in favor of someone else's content.

                            Honestly, I've seen very few people deal with canonicalization for PDFs, and even those cases were small or obvious (like a page with the exact same content being outranked by the duplicate PDF). It's kind of uncharted territory.

                            1 Reply Last reply Reply Quote 3
                            • EGOL
                              EGOL @Gestisoft-Qc last edited by Jan 25, 2012, 8:13 PM Jan 25, 2012, 8:13 PM

                              Thanks for all of your input Dr. Pete. The example that you use is almost exactly what I have - hundreds of .pdfs on a fifty page site. These .pdfs rank well in the SERPs, accumulate pagerank, and pass traffic and link value back to the main site through links embedded within the .pdf. The also have natural links from other domains. I don't want to block them or nofollow them butyour suggestion of using header directive sounds pretty good.

                              1 Reply Last reply Reply Quote 0
                              • Dr-Pete
                                Dr-Pete Staff @Gestisoft-Qc last edited by Jan 26, 2012, 12:53 PM Jan 25, 2012, 7:15 PM

                                Oh, sorry - so these PDFs aren't duplicates with your own web/HTML content so much as duplicates with the same PDFs on other websites?

                                That's more like a syndication situation. It is possible that, if enough people post these PDFs, you could run into trouble, but I've never seen that. More likely, your versions just wouldn't rank. Theoretically, you could use the header-level canonical tag cross-domain, but I've honestly never seen that tested.

                                If you're talking about a handful of PDFs, they're a small percentage of your overall indexed content, and that content is unique, I wouldn't worry too much. If you're talking about 100s of PDFs on a 50-page website, then I'd control it. Unfortunately, at that point, you'd probably have to put the PDFs in a folder and outright block it. You'd remove the risk, but you'd stop ranking on those PDFs as well.

                                1 Reply Last reply Reply Quote 2
                                • EGOL
                                  EGOL @Gestisoft-Qc last edited by Jan 25, 2012, 1:56 PM Jan 25, 2012, 1:56 PM

                                  @EGOL: Can you expend a bit on your Author suggestion?

                                  I was wondering if there is a way to do rel=author for a pdf document.  I don't know how to do it and don't know if it is possible.

                                  1 Reply Last reply Reply Quote 0
                                  • Gestisoft-Qc
                                    Gestisoft-Qc Subscriber @Dr-Pete last edited by Jan 24, 2012, 5:08 PM Jan 24, 2012, 5:08 PM

                                    To make sure I understand what I'm reading:

                                    • PDFs don't usually rank as well as regular pages (although it is possible)
                                    • It is possible to configure a canonical tag on a PDF

                                    My concern isn't that our PDFs may outrank the original content but rather getting slammed by Google for publishing them.

                                    Am right in thinking a canonical tag prevents to accumulate link juice? If so I would prefer to not use it, unless it leads to Google slamming.

                                    Any one has experienced Google retribution for publishing PDF coming from a 3rd party?

                                    @EGOL: Can you expend a bit on your Author suggestion?

                                    Thanks all!

                                    EGOL Dr-Pete 5 Replies Last reply Jan 25, 2012, 11:39 PM Reply Quote 0
                                    • Dr-Pete
                                      Dr-Pete Staff last edited by Jan 24, 2012, 3:10 PM Jan 24, 2012, 3:09 PM

                                      I think it's possible, but I've only seen it in cases that are a bit hard to disentangle. For example, I've seen a PDF outrank a duplicate piece of regular content when the regular content had other issues (including massive duplication with other, regular content). My gut feeling is that it's unusual.

                                      If you're concerned about it, you can canonicalize PDFs with the header-level canonical directive. It's a bit more technically complex than the standard HTML canonical tag:

                                      http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html

                                      I'm going to mark this as "Discussion", just in case anyone else has seen real-world examples.

                                      Gestisoft-Qc 1 Reply Last reply Jan 24, 2012, 5:08 PM Reply Quote 2
                                      • EGOL
                                        EGOL last edited by Jan 23, 2012, 9:30 PM Jan 23, 2012, 9:28 PM

                                        I am really interested in hearing what others have to say about this.

                                        I know that .pdfs can be very valuable content.  They can be optimized, they rank in the SERPs, they accumulate PR and they can pass linkvalue.  So, to me it would be a mistake to block them from the index...

                                        However, I see your point about dupe content... they could also be thin content.  Will panda whack you for thin and dupes in your PDFs?

                                        How can canonical be used... what about author?

                                        Anybody know anything about this?

                                        1 Reply Last reply Reply Quote 3
                                        • MargaritaS
                                          MargaritaS last edited by Jan 24, 2012, 3:10 PM Jan 23, 2012, 7:20 PM

                                          Just like any other piece of duplicate content, you can use canonical link elements to specify the original piece of content (if there's indeed more than one identical piece). You could also block these types of files in the robots.txt, or use noindex-follow meta tags.

                                          Regards,

                                          Margarita

                                          1 Reply Last reply Reply Quote 5
                                          • 1 / 1
                                          1 out of 20
                                          • First post
                                            1/20
                                            Last post

                                          Got a burning SEO question?

                                          Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                                          Start my free trial


                                          Browse Questions

                                          Explore more categories

                                          • Moz Tools

                                            Chat with the community about the Moz tools.

                                          • SEO Tactics

                                            Discuss the SEO process with fellow marketers

                                          • Community

                                            Discuss industry events, jobs, and news!

                                          • Digital Marketing

                                            Chat about tactics outside of SEO

                                          • Research & Trends

                                            Dive into research and trends in the search industry.

                                          • Support

                                            Connect on product support and feature requests.

                                          • See all categories

                                          Related Questions

                                          • AMHC

                                            Removing duplicate content

                                            Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs. 1. We've instituted canonical tags site wide. 2. We are using the parameters function in Webmaster Tools. 3. We are using 301 redirects on all of the obsolete URLs 4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals. 5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed. None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical. Any ideas on how to clean up the mess?

                                            Intermediate & Advanced SEO | Jun 15, 2015, 3:08 PM | AMHC
                                            0
                                          • MBASydney

                                            Duplicate content on sites from different countries

                                            Hi, we have a client who currently has a lot of duplicate content with their UK and US website. Both websites are geographically targeted (via google webmaster tools) to their specific location and have the appropriate local domain extension. Is having duplicate content a major issue, since they are in two different countries and geographic regions of the world? Any statement from Google about this? Regards, Bill

                                            Intermediate & Advanced SEO | Aug 1, 2013, 11:08 AM | MBASydney
                                            0
                                          • turnbullholdingsltd

                                            Best practice for duplicate website content: same root domain name but different extension

                                            Hi there I have a new client who has two websites: http://www.bayofislandsteambuilding.co.nz
                                            http://www.bayofislandsteambuilding.org.nz They are the same in every regard apart from the domain extension (.co.nz & .org.nz) which is likely to be causing them issues with Google ranking given the huge amount of duplicate content. What is the best practice approach to fixing this? Normally, if I was starting from scratch, I would set one of the extensions as an alias which redirects to the main domain. Thanks in advance. Laurie

                                            Intermediate & Advanced SEO | Jul 23, 2013, 11:28 PM | turnbullholdingsltd
                                            0
                                          • deskstudio

                                            How to Remove Joomla Canonical and Duplicate Page Content

                                            I've  attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now  I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.

                                            Intermediate & Advanced SEO | Nov 13, 2012, 4:06 AM | deskstudio
                                            0
                                          • irvingw

                                            Is SEOmoz.org creating duplicate content with their CDN subdomain?

                                            Example URL: http://cdn.seomoz.org/q/help-with-getting-no-conversions Canonical is a RELATIVE link, should be an absolute link pointing to main domain: http://www.seomoz.org/q/help-with-getting-no-conversions <link href='[/q/help-with-getting-no-conversions](view-source:http://cdn.seomoz.org/q/help-with-getting-no-conversions)' rel='<a class="attribute-value">canonical</a>' /> 13,400 pages indexed in Google under cdn subdomain go to google   >  site:http://cdn.seomoz.org https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=site:http%3A%2F%2Fcdn.seomoz.org%2F&oq=site:http%3A%2F%2Fcdn.seomoz.org%2F&gs_l=hp.2...986.6227.0.6258.28.14.0.0.0.5.344.3526.2-10j2.12.0.les%3B..0.0...1c.Uprw7ko7jnU&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=97577626a0fb6a97&biw=1920&bih=936

                                            Intermediate & Advanced SEO | Aug 27, 2012, 6:36 PM | irvingw
                                            1
                                          • Byron_W

                                            Duplicate Content on Wordpress b/c of Pagination

                                            On my recent crawl, there were a great many duplicate content penalties.  The site is http://dailyfantasybaseball.org. The issue is: There's only one post per page.  Therefore, because of wordpress's (or genesis's) pagination, a page gets created for every post, thereby leaving basically every piece of content i write as a duplicate. I feel like the engines should be smart enough to figure out what's going on, but if not, I will get hammered. What should I do moving forward? Thanks!

                                            Intermediate & Advanced SEO | Jun 28, 2012, 1:40 PM | Byron_W
                                            0
                                          • YNWA

                                            Duplicate Content on Press Release?

                                            Hi, We recently held a charity night in store. And had a few local celebs turn up etc... We created a press release to send out to various media outlets, within the press release were hyperlinks to our site and links on certain keywords to specific brands on our site. My question is, should we be sending a different press release to each outlet to stop the duplicate content thing, or is sending the same release out to everyone ok? We will be sending approx 20 of these out, some going online and some not. So far had one local paper website, a massive football website and a local magazine site. All pretty much same content and a few pics. Any help, hints or tips on how to go about this if I am going to be sending out to a load of other sites/blogs? Cheers

                                            Intermediate & Advanced SEO | Oct 14, 2012, 9:09 AM | YNWA
                                            0
                                          • Creode

                                            Duplicate content on ecommerce sites

                                            duplicate content

                                            I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,

                                            Intermediate & Advanced SEO | Jul 1, 2024, 9:51 AM | Creode
                                            0

                                          Get started with Moz Pro!

                                          Unlock the power of advanced SEO tools and data-driven insights.

                                          Start my free trial
                                          Products
                                          • Moz Pro
                                          • Moz Local
                                          • Moz API
                                          • Moz Data
                                          • STAT
                                          • Product Updates
                                          Moz Solutions
                                          • SMB Solutions
                                          • Agency Solutions
                                          • Enterprise Solutions
                                          Free SEO Tools
                                          • Domain Authority Checker
                                          • Link Explorer
                                          • Keyword Explorer
                                          • Competitive Research
                                          • Brand Authority Checker
                                          • Local Citation Checker
                                          • MozBar Extension
                                          • MozCast
                                          Resources
                                          • Blog
                                          • SEO Learning Center
                                          • Help Hub
                                          • Beginner's Guide to SEO
                                          • How-to Guides
                                          • Moz Academy
                                          • API Docs
                                          About Moz
                                          • About
                                          • Team
                                          • Careers
                                          • Contact
                                          Why Moz
                                          • Case Studies
                                          • Testimonials
                                          Get Involved
                                          • Become an Affiliate
                                          • MozCon
                                          • Webinars
                                          • Practical Marketer Series
                                          • MozPod
                                          Connect with us

                                          Contact the Help team

                                          Join our newsletter
                                          Moz logo
                                          © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                          • Accessibility
                                          • Terms of Use
                                          • Privacy

                                          Looks like your connection to Moz was lost, please wait while we try to reconnect.