Indexing of PDF files
-
Hey all,
I understand the functionality of PDF files being indexed and how to remove them if required so in this post I'm not requiring any advice on 'how to' as such, but i just wanted to get a general opinion/consensus of if you deliberately allow PDF files to be crawled/indexed.
Whether or not you guys optimise the files for search.
If you do disallow them from being crawled and indexed, why?
Generally the pro's and con's you may have found about have searchable PDF files as part of your indexed content. -
No opinions here... just facts....
-
PDF files show in your Google backlinks
-
PDF files can contain anchor text backlinks
-
PDF files accumulate pagerank
-
PDF files pass pagerank
-
If you place obvious links in PDF files people will click them and land onto your .html pages
-
Other people sometimes grab your PDF files and place them on their own website giving you backlinks from their domain if you were smart enough to embed links within them
-
PDF files can be optimized, rank high in the search engines and pull in a LOT of traffic
-
Some types of content displays and prints much better in a PDF file than it does on a webpage
-
PDF files allow you to control the "look" of printed documents
-
A huge report is often better posted as a PDF than as html documents
-
You can lock PDF documents to keep others from monkeying with your content (determined people will get around this).
-
Contrary to popular belief, PDF documents can be monetized... just toss in a shopping link or links to pages where money can be made. I have not heard of anyone paying for ad space in a PDF but there is no reason why that could not be done.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Index Issue
2 months ago, I registered a domain named www.nextheadphone.com I had a plan to learn SEO and create a affiliate blog site. In my website I had 3 types of content. Informative Articles Headphone Review articles Product Comparision Review articles Problem is, Google does not index my informative articles. I dont know the reasons. https://www.nextheadphone.com/benefits-of-noise-cancelling-headphones/
Content Development | | NextHeadphone
https://www.nextheadphone.com/noise-cancelling-headphones-protect-hearing/ Is there anyone who can take a look and find the issues why google is not indexing my articles? I will be waiting for your reply0 -
Does a publicly available PDF embed impact Uniqueness
Hi, If I embed a copy of a pdf document that is freely available and distributable on my web page, will it impact the uniqueness of my web page? This is assuming that the rest of the content is unique and of reasonably high quality. Also, the pdf is uploaded in my web host server, not directly linked to the original source.
Content Development | | dwautism0 -
One story stands out for not getting indexed?
We have all our stories published today ( 20-Jun-2013 ) got indexed by google except this ( http://coed.com/2013/06/20/heres-a-video-of-kate-upton-topless-on-a-horse/ ). Do anyone out there have any clue about that? Thanks in advance
Content Development | | COEDMediaGroup0 -
In Index but not in Serps
Hi, I have a situation with a client site which is quite frustrating. Basically, most "recent" (by that I mean for the last couple of months) blog posts are failing to reach the SERPS (actually, one has and a couple have from the early days but it's taken months for them to arrive). Previously the blog posts were indexed very quickly - often instantly. Now, I've checked WMT etc and I've submitted each post manually but still nothing. The Sitemap is valid etc. However, pages (not blog posts) seem to be getting into the serps very quickly. Another complication is that if I search: site:www.domainname.com and set the date filter to a month I can see some of the earlier blog posts in that result set. However, if I scrape a bit of unique content from one of those posts and search - nothing in the SERPS. And my Moz report tells me that the page is not to be found in the top 50 either (so I'm confident these pages are not in the SERPS). Any ideas why this would happen to just blog posts? Is it something to do with the parent blog landing perhaps being too strong in the rankings? Any ideas appreciated. Thanks.
Content Development | | KMUK0 -
On page content and PDF - Dup?
Hi We are writing a useful article which we want to put on our site, but we also want to add it as a pdf which people can download - will this be classed as dup copy?
Content Development | | jj34340 -
Should I Have No Index, No Follow On Blog Category & Tag Pages?
At some point in the past I read or was told that No Index, No Follow tags on category and tag pages were a good thing on a standard WordPress blog in order to prevent duplicate content issues. Is this still true or was it ever true?
Content Development | | eTundra0 -
I have a page where you can download a PDF of the material - should I exclude the PDF from the search engines?
In my niche, there is a controversial research article that is very popular. I am writing a rebuttal to this article and giving another point of view. My article has the potential to be really good link bait for my site. The original article is often printed out to be shown to professionals in my niche. My hope is that people will do the same with mine. So, I plan to have a PDF version of my article available on my page. The article that is visible on my site (i.e. non PDF) will be a graphic rich article that is easy for the reader to go through. I plan to have the PDF have all of the same text, but it won't have as many graphics - it will look more like a scientific research article. So, should I exclude the pdf from search engines so that it isn't duplicate content? Or does that even matter seeing as it is a duplicate of my own content? I want people to link to the main article, not the pdf. Any tips would be greatly appreciated!
Content Development | | MarieHaynes1 -
Please help me stop google indexing https pages on my wordpress site
I added SSL to my wordpress blog because that was the only way to get a dedicated IP address for my site at my host. Now I am noticing Google has started indexing posts both as http and https. Can some one please help how to force google not to index https as I am sure its like having duplicate content. All help is appreciated. So far I have added this to top of htaccess file: RewriteEngine on Options +FollowSymlinks RewriteCond %{SERVER_PORT} ^443$ RewriteRule ^robots.txt$ robots_ssl.txt And added robots_ssl.txt with following: User-agent: Googlebot Disallow: / User-agent: * Disallow: / But https pages are still being indexed. Please help.
Content Development | | rookie1230