PDF best practices: to get them indexed or not? Do they pass SEO value to the site?
-
All PDFs have landing pages, and the pages are already indexed. If we allow the PDFs to get indexed, then they'd be downloadable directly from google's results page and we would not get GA events.
The PDFs info would somewhat overlap with the landing pages info. Also, if we ever need to move content, we'd now have to redirects the links to the PDFs.
What are best practices in this area? To index or not?
What do you / your clients do and why?
Would a PDF indexed by google and downloaded directly via a link in the SER page pass SEO juice to the domain? What if it's on a subdomain, like when hosted by Pardot? (www1.example.com)
-
repeatedly noticed that google index PDF files. But only their headers, without the contents of the file itself.
If you format the file description correctly, you can do it through the PDF Architect (http://pdf-architect.ideaprog.download/) program, or any other convenient for you.
-
PDFs can be canonicalized using .htaccess. Google is usually very slow to discover and obey this but it can be done. However, if your PDF is not close to being an exact copy of the target page, Google will probably not honor the canonicalization and they will index the PDF and the html page separately.
PDFs can be optimized (given a title tag) by editing the properties of the document. Most PDF - making software has the ability to do this.
You can insert "buy buttons" and advertising in PDFs. Just make an image, paste it into the document and link it to your shopping cart or to your target document.
PDFs accumulate linkjuice and pass it to other documents.
Use the same strategies with PDFs as you would with an html page for directing visitors where you want them to go and getting them to do what you want them to do.
Some people will link to your PDF, others will grab your PDF and place it on their website (in that situation, you lose the canonical but still get juice from any embeded links), and benefit from ads and buttons that might be included. Lock the PFD with your PDF-creating software to prevent people from editing your PDF (but they can always copy/paste to get around it).
Other types of documents such as Excel spreadsheets, PowerPoint documents, Google images, etc can have embedded text, embedded links and other features that are close to equivalent to an html document.
-
PDF documents aren't written in HTML so you can't put canonical tags into PDFs. So that won't help or work. In-fact, if you are considering any types of tags of any kind for your PDFs, stop - because PDF files cannot have HTML tags embedded within them
If your PDF files have landing pages, just let those rank and let people download the actual PDF files from there if they chose to do so. In reality, it's best to convert all your PDFs to HTML and then give a download link to the PDF file in case people need it (in this day and age though, PDF is a backwards format. It's not even responsive, for people's pones - it sucks!)
The only canonical tags you could apply, would be on the landing pages (which do support HTML) pointing to the PDF files. Don't do that though, it's silly. Just convert the PDFs to HTML, then leave a download button for the old PDFs in-case anyone absolutely needs them. If the PDF and the HTML page contain similar info, it won't affect you very much.
What will affect you, is putting canonical tags on the landing pages thus making them non-canonical (and stopping the landing pages from ranking properly). You're in a situation where a perfect outcome isn't possible, but that's no reason to pick the worst outcome by 'over-adhering' to Google's guidelines. Sometimes people use Google's guidelines in ways Google didn't anticipate that they would
PDF documents don't usually pass PageRank at all, as far as I know
If you want to optimise the PDF documents themselves, the document title which you save them with is used in place of a <title>tag (which, since PDFs aren't in HTML, they can't use <title>). You can kind of optimise PDF documents by editing their document titles, but it's not super effective and in the end HTML conversions usually perform much better. As stated, for the old fossils who still like / need PDF, you can give them a download link</p> <p>In the case of downloadable PDF files with similar content to their connected landing pages, Google honestly don't care too much at all. Don't go nutty with canonical tags, don't stop your landing pages from ranking by making them non-canonical</p></title>
-
Yes, the PDFs would help increase your domain rank as they are practically considered as pages by Google, as explained in their QnA here.
Regarding hosting the PDFs on a subdomain, Google has stated that it's almost the same as having them on a subfolder, but that is highly contested by everyone since it's much harder to rank a subdomain than a subfolder.
Regarding the canonical tags, they are created for "Similar or Duplicate Pages", so the content doesn't have to be identical, and you'll be good so long as most of the content is the same. Otherwise, you can safely have them both be and have backlinks linking from the pdf to the main content to transfer "link juice", as they are considered as valid links.
I hope my response was beneficial to you and that the included proof was substantial.
Daniel Rika
-
Thank you.
Could you address my question about what's best practice? What do most companies do?
I am not sure what the best choice would be for us -- to expose PDFs which compete with their own landing pages or not.
Also, do you know if PDFs pass SEO "juice" to the main domain? Even if they are hosted at www2.maindomain.com?
Where can I see some proof that this is the case?
If the PDFs have a canonical tag pointing to the parent page, wouldn't this be confusing for the search engines as these are two separate files with differing content? Canonical tags are usually used to eliminate duplicates for differing URLs with identical content.
-
Whether you want to index the pdf directly or not will mostly depend on the content of the pdf:
- If you are using the pdf as a way to gather e-mails for your newsletter, or if you are offering the pdf as a way to get users to your site, then it would be best not to have them indexed directly, but instead have the users go to your site first.
- If the pdf in itself is a way for you to promote your website or content then you can index it so that it can be accessed directly and may help you to get a bit more rank or clicks.
If you are looking to track pdf views, there are options to connect GA and track your pdf views, such as this plugin.
If the content is similar to the web page, then you can put a canonical tag to transfer the ranking. You can add it to the http header using the .htaccess file as explained here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should Google Trends Match Organic Traffic to My Site?
When looking at Google Trends and my Organic Traffic (using GA) as percentages of their total yearly values I have a correlation of .47. This correlation doesn't seem right when you consider that Google Trends (which is showing relative search traffic data) should match up pretty strongly to your Organic Traffic. Any thoughts on what might be going on? Why isn't Google Trends correlating with Organic Traffic? Shouldn't they be pulling from the same data set? Thanks, Jacob
Reporting & Analytics | | jacob.young.cricut0 -
Free Media Site / High Traffic / Low Engagement / Strategies and Questions
Hi, Imagine a site "mediapalooza dot com" where the only thing you do there is view free media. Yet Google Analytics is showing the average view of a media page is about a minute; where the average length of media is 20 - 90 minutes. And imagine that most of this media is "classic" and that it is generally not available elsewhere. Note also that the site ranks terribly in Google, despite having decent Domain Authority (in the high 30's), Page Authority in the mid 40's and a great site and otherwise quite active international user base with page views in the tens of thousands per month. Is it possible that GA is not tracking engagement (time on site) correctly? Even accounting for the imperfect method of GA that measures "next key pressed" as a way to terminate the page as a way to measure time on page, our stats are truly abysmal, in the tenths of a percentage point of time measured when compared with actual time we think the pages are being used. If so, will getting engagement tracking to more accurately measure time on specif pages and site signal Google that this site is actually more important than current ranking indicates? There's lots of discussion about "dwell time" as this relates to ranking, and I'm postulating that if we can show Google that we have extremely good engagement instead of the super low stats that we are reporting now, then we might get a boost in ranking. Am I crazy? Has anyone got any data that proves or disproves this theory? as I write this out, I detect many issues - let's have a discussion on what else might be happening here. We already know that low engagement = low ranking. Will fixing GA to show true engagement have any noticeable impact on ranking? Can't wait to see what the MOZZERS think of this!
Reporting & Analytics | | seo_plus0 -
Is it possible to get demographic and interest information from DoubleClick cookies?
We use Google Analytics and we are currently extracting information from the Google Analytics cookies about our visitors. Is it possible to access DoubleClick cookies in a similiar fashion and get some demographic/interest information for each visitor to our website (if they have a DoubleClick cookie set)? If so, any information on how to retrieve it would be very appreciated.
Reporting & Analytics | | WebpageFX0 -
Any harm and why the differences - multiple versions of same site in WMT
In Google Webmaster Tools we have set up: ourdomain.co.nz
Reporting & Analytics | | zingseo
ourdomain.co.uk
ourdomain.com
ourdomain.com.au
www.ourdomain.co.nz
www.ourdomain.co.uk
www.ourdomain.com
www.ourdomain.com.au
https://www.ourdomain.co.nz
https://www.ourdomain.co.uk
https://www.ourdomain.com
https://www.ourdomain.com.au As you can imagine, this gets confusing and hard to manage. We are wondering whether having all these domains set up in WMT could be doing any damage? Here http://support.google.com/webmasters/bin/answer.py?hl=en&answer=44231 it says: "If you see a message that your site is not indexed, it may be because it is indexed under a different domain. For example, if you receive a message that http://example.com is not indexed, make sure that you've also added http://www.example.com to your account (or vice versa), and check the data for that site." The above quote suggests that there is no harm in having several versions of a site set up in WMT, however the article then goes on to say: "Once you tell us your preferred domain name, we use that information for all future crawls of your site and indexing refreshes. For instance, if you specify your preferred domain as http://www.example.com and we find a link to your site that is formatted as http://example.com, we follow that link as http://www.example.com instead." This suggests that having multiple versions of the site loaded in WMT may cause Google to continue crawling multiple versions instead of only crawling the desired versions (https://www.ourdomain.com + .co.nz, .co.uk, .com.au). However, even if Google does crawl any URLs on the non https versions of the site (ie ourdomain.com or www.ourdomain.com), these 301 to https://www.ourdomain.com anyway... so shouldn't that mean that google effectively can not crawl any non https://www versions (if it tries to they redirect)? If that was the case, you'd expect that the ourdomain.com and www.ourdomain.com versions would show no pages indexed in WMT, however the oposite is true. The ourdomain.com and www.ourdomain.com versions have plenty of pages indexed but the https versions have no data under Index Status section of WMT, but rather have this message instead: Data for https://www.ourdomain.com/ is not available. Please try a site with http:// protocol: http://www.ourdomain.com/. This is a problem as it means that we can't delete these profiles from our WMT account. Any thoughts on the above would be welcome. As an aside, it seems like WMT is picking up on the 301 redirects from all ourdomain.com or www.ourdomain.com domains at least with links - No ourdomain.com or www.ourdomain.com URLs are registering any links in WMT, suggesting that Google is seeing all links pointing to URLs on these domains as 301ing to https://www.ourdomain.com ... which is good, but again means we now can't delete https://www.ourdomain.com either, so we are stuck with 12 profiles in WMT... what a pain.... Thanks for taking the time to read the above, quite complicated, sorry!! Would love any thoughts...0 -
Looking for an Automated SEO report Software Solution
Buon Giorno from 4 degrees C mostly cloudy Wetherby UK 🙂 I love Google Analytics but I'm bogged down with analytics report writting. I'm looking for a web analytics softeare package that: 1. White Label ie we can brand the reports up
Reporting & Analytics | | Nightwing
2. Bespoke ie i can pick and choose what I report on
3. Automated ie I can set a time & date when the client receives the report. Any recommendations appreciated 🙂 Grazie tanto, David0 -
No Link Data Available for this URL appears often. Are my sites too small to show up?
I am on the trial until Mar 5, 2012. I seldom get the info I want. Is it because my sites are too narrow a niche? I don't seem to be getting the data I'd like from your service. I'm trying to like it, but when I keep getting messages like this, it makes it hard to justify: "No Link Data Available for this URL appears often" Sample sites that I am unable to get data. I especially would like to know how many backlinks exist for each site. I paid someone to help me with them and I'd like to verify their work.: http://costaricadentistreview.com/ http://costaricadentistreviews.com/ http://costaricadentalimplants.org Any suggestions? Thanx Kurt Gross
Reporting & Analytics | | kurtray0 -
.com version and .org version of site
So i just discovered that a site I now managae has a .com version - as well as the .org version that is the one everyone knows about! I'm guessing this is not a good thing... So the whole site eg www.abc.org/example has a mirror page www.abc.com/example.... What should I do about this? Is it really bad to have 2 versions out there? Thanks!
Reporting & Analytics | | inhouseninja0 -
GoDaddy SEO Services?
So one of my client's wants to know why he shouldn't just hire GoDaddy to "do SEO" for him! He found this page: http://www.godaddy.com/search-engine/seo-services.aspx Without actually trying it, It looks like a bunch of stuff I already do (rank tracking, PR tracking, traffic tracking, ROI projections), stuff I shouldn't do (site submissions to search engines), and stuff I probably don't need ("keyword wizard"). That said, I am intrigued by: the dedicated phone number with track and monitor (is it better than Hosted Numbers?), the business listing (is it a link worth the $6/mo?), site analysis and optimization, and ROI dashboard. Does anyone have any experience with this? Does any piece of it have any value?
Reporting & Analytics | | TheEspresseo1