PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content - But it isn't!
Hi All, I have a site that releases alerts for particular problem/events/happenings. Due to legal stuff we keep the majority of the content the same on each of these event pages. The URLs are all different but it keeps coming back as duplicate content. The canonical tag is not right (i dont think for this) egs http://www.holidaytravelwatch.com/alerts/call-to-arms/egypt/coral-sea-waterworld-resort-sharm-el-sheikh-egypt-holiday-complaints-july-2014 http://www.holidaytravelwatch.com/alerts/call-to-arms/egypt/hotel-concorde-el-salam-sharm-el-sheikh-egypt-holiday-complaints-may-2014
On-Page Optimization | | Astute-Media0 -
Duplicate content, which seems not to be duplicate :S
After crawling I am used to getting a lot of duplicate content messages in Moz, which are High Priority. I do not know what to do with them, since I believe we tackled all the issues. Main point being the advise to put in a link rel=canonical. An example of a page that accordeing to the report has a duplicate. I do not see how. Can you help with that? http://www.beat-it.nl/4y6hctr24x7wdmr-ml350-p-ic-procaresvc.html duplicate sample http://www.beat-it.nl/modu-hp-a5800-acm-for-64-256-aps.html
On-Page Optimization | | Raymo0 -
Duplicate content on events site
I have an event website and for every day the event occurs the event has a page. For example: The Oktoberfest in Germany the event takes 16 days. My site would have 16 (almost)identical pages about the Oktoberfest(same text, adres, photos, contact info). The only difference between the pages is the date mentioned on the page. I use rich snippets. How does google treat my pages and what is the best practice.
On-Page Optimization | | dragonflo0 -
Social Media Profile and Website content dupe content issues ?
Is there a dupe content issue if your social media profiles and website have same content/copy ? For example if you have written great copy for your website home page then literally copy & paste that into your G+ co page introduction (since sums everything up perfectly and don't want to change it) would that have a negative effect or be perfectly ok ? cheers dan
On-Page Optimization | | Dan-Lawrence1 -
Pagination on related content within a subject
A client has come to us with new content and sections for their site. The two main sections are "Widget Services" - the sales pages, and "Widget Guide" - a non-commercial guide to using the widgets etc. Both the Services and Guide pages contain the same pages (red widgets, blue widgets, triangle widgets), and - here's the problem - the same first paragraph. i.e. ======== Blue widget services Blue widgets were invented in 1906 by Professor Blue. It was only a coincidence that they were blue. We stock a full range of blue widgets, we were voted best blue widget handler at widgetcon 2013. Buy one now See our guide to blue widgets here Guide to blue widgets Blue widgets were invented in 1906 by Professor Blue. It was only a coincidence that they were blue. The thing about blue widgets as they're not at all like red widgets at all. For starters, they're blue. Find more information about our blue widgets here ======== In all of these pages, the first paragraph is ~200 words and provides a great introduction to the subject, and the rest of the page is 600-800 words, making these pages unique enough to justify being different pages. We want to deal with this by declaring each page as a paginated version of a two page article on each type of widget (using rel=prev/next). Our thinking is that Google probably handles introuctions/headers on paginated content in a sensible way. Has anyone experienced this before? Is there any issues on using rel="prev" and rel="next" when they're not strictly paginated?
On-Page Optimization | | BabelPR0 -
Mobile website content
What is the point of optimizing (on-page SEO) a parallel mobile website if the mobile search results are taken from the general (desktop) index?
On-Page Optimization | | echo10 -
Duplicate content
the report shows duplicate content for a category page that has more than one page. how can we avoid this since i cannot make a different meta content for the second page of the category page: http://www.geographics.com/2-Cool-Colors-Poster-Board-14x22/c183_66_327_387/index.html http://www.geographics.com/2-Cool-Colors-Poster-Board-14x22/c183_66_327_387/index.html?page=2 thanks, Madlena
On-Page Optimization | | Madlena0 -
Site URL's
We are redeveloping our website, and have the option to amend URLs (with 301 redirects from old URL to new), so my question is: Would 'golfsite.com/golf-clubs' achieve superior rankings than 'golfsite.com/clubs' for the search term 'golf clubs' if all other factors were the same? Should the URL reflect the intended search term wherever possible?
On-Page Optimization | | swgolf1230