PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Potential duplicate content issue?
We have a category on our website for PVC rolls to buy as standard 50m rolls (this includes 15 products in the category). We're also releasing PVC rolls to buy per metre (10m roll/25m roll etc...), again with 15 products, which we are adding as a separate category as it makes more sense for our customers and removes the risk of having too many options. Would using the same description be bad practice for SEO? The product is exactly the same just available in different roll sizes, but we definitely do not want to combine categories as it doesn't work for our customers. Any help or suggestions would be appreciated, thanks.
On-Page Optimization | | RayflexGroup0 -
Thoughts on archiving content on an event site?
I have a few sites that are used exclusively to promote live events (ex. tradeshows, conference, etc). In most cases these sites content fewer than 100 pages and include information for the upcoming event with links to register. Some time after the event has ended, we would redesign the site and start promoting next years event...essentially starting over with a new site (same domain). We understand the value that many of these past event pages have for users who are looking for info from the past event and we're looking for advice on how best to archive this content to preserve for SEO. We tend to use concise urls for pages on these sites. Ex. www.event.com/agenda or www.event.com/speakers. What are your thoughts on archiving the content from these pages so we can reuse the url with content for the new event? My first thought is to put these pages into an archive, like www.event.com/2015/speakers. Is there a better way to do this to preserve the SEO value of this content?
On-Page Optimization | | accessintel0 -
Duplicate content
Are images considered duplicate content too? Example:
On-Page Optimization | | BridalHotspot
I've got a size chart on each my lingerie pages. All written content is unique but I'm using the same chart for all those pages.0 -
Duplicate Content - Deleting Pages
The Penguin update in April 2012 caused my website to lose about 70% of its traffic overnight and as a consequence, the same in volume of sales. Almost a year later I am stil trying to figure out what the problem is with my site. As with many ecommerce sites a large number of the product pages are quite similar. My first crawl with SEOMOZ identified a large number of pages that are very similar - the majority of these are in a category that doesn't sell well anyway and so to help with the problem I am thinking of removing one of my categories (about 1000 products). My question is - would removing all these links boost the overall SEO of the site since I am removing a large chunk of near-duplicate links? Also - if I do remove all these links would I have to put in place a 301 redirect for every single page and if so, what's the quickest way of doing this. My site is www.modern-canvas-art.com Robin
On-Page Optimization | | robbowebbo0 -
Footer Content
We currently have footer content contained in a single php include file and is included in every page and contains the following: Most recent 3 tweets from our twitter feed Snippets of our 3 most recent blogs posts navigation links to our main pages (essentially the same as our main navigation in the header) Is this good/bad?
On-Page Optimization | | NeilD0 -
Duplicate content in the title
Good morning, I am developing an application that searches offers in the press. The problem I have is the follow one:
On-Page Optimization | | ofuente
When I find an offer that I have already post, I cant use the same URL because it generates duplicate content , as the URL is generated from the title. If I find two offers in different stores (for example Thomson TV) I am studying two options. The first would be to add a number at the end of the URL
http://www.offertazo.com/televisor-thomson
http://www.offertazo.com/televisor-thomson1
http://www.offertazo.com/televisor-thomson2 Another option I propose would be to add semantic data to provide value (such as the date). For example:
http://www.offertazo.com/01-12-12/televisor-thomson I appreciate your help.0 -
Duplicate content
the report shows duplicate content for a category page that has more than one page. how can we avoid this since i cannot make a different meta content for the second page of the category page: http://www.geographics.com/2-Cool-Colors-Poster-Board-14x22/c183_66_327_387/index.html http://www.geographics.com/2-Cool-Colors-Poster-Board-14x22/c183_66_327_387/index.html?page=2 thanks, Madlena
On-Page Optimization | | Madlena0 -
Duplicate content? Not sure.
Good news! I have my first real SEO gig and now I have to be able to actually deliver. I'm up for it but I want to be sure I'm seeing what I think I am before suggesting any changes. I'm working my way throught Danny Dover's excellent book SEO Secrets and learning tons! To see if there is duplicate content on the site, I've taken a sentence from one of the pages on the site and searched for it: i.e., site:storybooksforhealing.com "Some of the most quiet moments are often the most difficult after a loss. Mornings, late nights, time alone." The SERPs show 7 pages that have this text on it. It seems like this is duplicate content, right? This is a Wordpress website so what's happening is the actual page is here: www.storybooksforhealing.com/publish-cup-of-joy/ but there are several archive pages that show excerpts of this text, too. If this is duplicate content (first question) then how would I go about remedying it? Should I set the canonical reference to /publish-cup-of-joy page? Thank you for being patient with my NOOB questions.
On-Page Optimization | | ChristiMc0