PDF best practices: to get them indexed or not? Do they pass SEO value to the site?
-
All PDFs have landing pages, and the pages are already indexed. If we allow the PDFs to get indexed, then they'd be downloadable directly from google's results page and we would not get GA events.
The PDFs info would somewhat overlap with the landing pages info. Also, if we ever need to move content, we'd now have to redirects the links to the PDFs.
What are best practices in this area? To index or not?
What do you / your clients do and why?
Would a PDF indexed by google and downloaded directly via a link in the SER page pass SEO juice to the domain? What if it's on a subdomain, like when hosted by Pardot? (www1.example.com)
-
repeatedly noticed that google index PDF files. But only their headers, without the contents of the file itself.
If you format the file description correctly, you can do it through the PDF Architect (http://pdf-architect.ideaprog.download/) program, or any other convenient for you.
-
PDFs can be canonicalized using .htaccess. Google is usually very slow to discover and obey this but it can be done. However, if your PDF is not close to being an exact copy of the target page, Google will probably not honor the canonicalization and they will index the PDF and the html page separately.
PDFs can be optimized (given a title tag) by editing the properties of the document. Most PDF - making software has the ability to do this.
You can insert "buy buttons" and advertising in PDFs. Just make an image, paste it into the document and link it to your shopping cart or to your target document.
PDFs accumulate linkjuice and pass it to other documents.
Use the same strategies with PDFs as you would with an html page for directing visitors where you want them to go and getting them to do what you want them to do.
Some people will link to your PDF, others will grab your PDF and place it on their website (in that situation, you lose the canonical but still get juice from any embeded links), and benefit from ads and buttons that might be included. Lock the PFD with your PDF-creating software to prevent people from editing your PDF (but they can always copy/paste to get around it).
Other types of documents such as Excel spreadsheets, PowerPoint documents, Google images, etc can have embedded text, embedded links and other features that are close to equivalent to an html document.
-
PDF documents aren't written in HTML so you can't put canonical tags into PDFs. So that won't help or work. In-fact, if you are considering any types of tags of any kind for your PDFs, stop - because PDF files cannot have HTML tags embedded within them
If your PDF files have landing pages, just let those rank and let people download the actual PDF files from there if they chose to do so. In reality, it's best to convert all your PDFs to HTML and then give a download link to the PDF file in case people need it (in this day and age though, PDF is a backwards format. It's not even responsive, for people's pones - it sucks!)
The only canonical tags you could apply, would be on the landing pages (which do support HTML) pointing to the PDF files. Don't do that though, it's silly. Just convert the PDFs to HTML, then leave a download button for the old PDFs in-case anyone absolutely needs them. If the PDF and the HTML page contain similar info, it won't affect you very much.
What will affect you, is putting canonical tags on the landing pages thus making them non-canonical (and stopping the landing pages from ranking properly). You're in a situation where a perfect outcome isn't possible, but that's no reason to pick the worst outcome by 'over-adhering' to Google's guidelines. Sometimes people use Google's guidelines in ways Google didn't anticipate that they would
PDF documents don't usually pass PageRank at all, as far as I know
If you want to optimise the PDF documents themselves, the document title which you save them with is used in place of a <title>tag (which, since PDFs aren't in HTML, they can't use <title>). You can kind of optimise PDF documents by editing their document titles, but it's not super effective and in the end HTML conversions usually perform much better. As stated, for the old fossils who still like / need PDF, you can give them a download link</p> <p>In the case of downloadable PDF files with similar content to their connected landing pages, Google honestly don't care too much at all. Don't go nutty with canonical tags, don't stop your landing pages from ranking by making them non-canonical</p></title>
-
Yes, the PDFs would help increase your domain rank as they are practically considered as pages by Google, as explained in their QnA here.
Regarding hosting the PDFs on a subdomain, Google has stated that it's almost the same as having them on a subfolder, but that is highly contested by everyone since it's much harder to rank a subdomain than a subfolder.
Regarding the canonical tags, they are created for "Similar or Duplicate Pages", so the content doesn't have to be identical, and you'll be good so long as most of the content is the same. Otherwise, you can safely have them both be and have backlinks linking from the pdf to the main content to transfer "link juice", as they are considered as valid links.
I hope my response was beneficial to you and that the included proof was substantial.
Daniel Rika
-
Thank you.
Could you address my question about what's best practice? What do most companies do?
I am not sure what the best choice would be for us -- to expose PDFs which compete with their own landing pages or not.
Also, do you know if PDFs pass SEO "juice" to the main domain? Even if they are hosted at www2.maindomain.com?
Where can I see some proof that this is the case?
If the PDFs have a canonical tag pointing to the parent page, wouldn't this be confusing for the search engines as these are two separate files with differing content? Canonical tags are usually used to eliminate duplicates for differing URLs with identical content.
-
Whether you want to index the pdf directly or not will mostly depend on the content of the pdf:
- If you are using the pdf as a way to gather e-mails for your newsletter, or if you are offering the pdf as a way to get users to your site, then it would be best not to have them indexed directly, but instead have the users go to your site first.
- If the pdf in itself is a way for you to promote your website or content then you can index it so that it can be accessed directly and may help you to get a bit more rank or clicks.
If you are looking to track pdf views, there are options to connect GA and track your pdf views, such as this plugin.
If the content is similar to the web page, then you can put a canonical tag to transfer the ranking. You can add it to the http header using the .htaccess file as explained here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Link juice am i getting it or not?
We have a newspaper source that links to a domain that forwards to a page on our website. Do we get link juice from that? Go here: https://dailygazette.com/ click on "Classifieds" then "CapRegion Homes" the button goes to the website www.capregionhomes.com which forwards to our main website, a page we built for them. My question is: Because the link from this site first goes to another site that auto-forwards to us are we actually getting any link juice from the daily gazette on this or is that first site getting all the juice & we technically are getting link juice from capregionhomes.com ?
Reporting & Analytics | | Cfarcher0 -
I get a - 'Temporarily unreachable' error message when I 'Fetch as Google' Any ideas please??
I wanted to Fetch this page and got this error from Google - Temporarily unreachable. I've never had this issue before?? I checked another page and it came back as 'Complete', so no problems there? Any ideas? Thank you in advance.
Reporting & Analytics | | MissThumann0 -
Site relaunch and impact on SEO
I have some tough decisions to make about a web site I run. The site has seen around for 20 years (September 1995, to be precise, is the date listed against the domain). Over the years, the effort I've expanded on the site has come and gone, but I am about to throw a lot of time and effort back into it. The majority of the content on the site is pretty dated, isn't tremendously useful to the audience (since it's pretty old) and the site design and URL architecture isn't particularly SEO-friendly. In addition, I have a database of thousands vendors (for the specific industry this site serves). I don't know if it's a factor any more but 100% of the links there have been populated by the vendors themselves specifically requesting inclusion (through a form we expose on the site). When the request is approved, the vendor link shows up on the appropriate pages for location (state) and segment of the industry. Though the links are all "opt-in" from vendors (we've never one added or imported any ourselves), I am sure this all looks like a terrible link farm to Google! And some vendors have asked us to remove their link for that reason 🙂 One final (very important) point. We have a relationship with a nationwide brand and have four very specific pages related to that brand on our site. Those pages are essential - they are by far the most visited pages and drive virtually all our revenue. The pages were put together with SEO in mind and the look and feel is very different to the rest of the site. The result is, effectively, a site-within-a-site. I need to carefully protect the performance of these pages. To put some rough numbers on this, the site had 475,000 page views over the last year, with about 320,000 of those being to these four pages (by the way, for the rest of the content "something happened" around May 20th of last year - traffic almost doubled overnight - even though there were no changes to our site). We have a Facebook presence and have put a little effort into that recently (increasing fans from about 10,000 last August to nearly 24,000 today, with a net gain of about 2,500 per month currently). I don't have any sense of whether that is a meaningful resource in the big picture. So, that's the background. I want to totally revamp the broader site - much improved design, intentional SEO decisions, far better, current and active content, active social media presence and so on. I am also moving from one CMS to another (the target CMS / Blog platform being WordPress). Part of me wants to do the following: Come up with a better plan for SEO and basically just throw out the old stuff and start again, with the exception of the four vendor pages I mentioned Implement redirection of the old URLs to new content (301s) Just stop exposing the vendor pages (on the basis that many of the links are old/broken and I'm really not getting any benefit from them) Leave the four important pages exactly as they are (URL and content-wise) I am happy to rebuild the content afresh because I have a new plan around that for which I have some confidence. But I have some important questions. If I go with the approach above, is there any value from the old content / URLs that is worth retaining? How sure can I be there is no indirect negative effect on the four important pages? I really need to protect those pages Is throwing away the vendor links simply all good - or could there be some hidden negative I need to know about (given many of the links are broken and go to crappy/small web sites, I'm hoping this is just a simple decision to make) And one more uber-question. I want to take a performance baseline so that I can see where I started as I start making changes and measure performance over time. Beyond the obvious metrics like number of visitors, time per page, page views per visit, etc what metrics would be important to collect from the outset? I am just at the start of this project and it is very important to me. Given the longevity of the site, I don't know if there is much worth retaining for that reason, even if the content changes radically. At a high level I'm trying to decide what questions I need to answer before I set off on this path. Any suggestions would be very much appreciated. Thanks.
Reporting & Analytics | | MarkWill0 -
Is it possible to use Google Tag Manager to pass a user’s text input into a form field to Google analytics?
Hey Everyone, I finally figured out how to use auto event tracking with Google Tag Manager, but didn't get the data I wanted. I want to see what users are typing into the search field on my site (the URL structure of my site isn't set up properly to use GA's built-in site search tracking). So, I set up the form submit event tracking in Google Tag Manager and used the following as my event tracking parameters: Category: Search Action: Search Value When I test and look in Google Analytics I just see: "search" and "search value." I wanted to see the text that I searched on my site. Not just the Action and Category of the event.... Is what I'm trying to do even possible? Do I need to set up a different event tracking parameter? Thanks everyone!
Reporting & Analytics | | DaveGuyMan0 -
May last year my sites orgainic listings, and therfore visitors, plumeted. Why?
Hello, May last year my site took a major fall. I am unsure why, and it's time I found out why. Recently I have rebuilt the site from scratch, except for the urls and content, and it's starting to turn back. What is the best method to go about understanding just what caused the decline? What are the options I have? See the image for a graph of the all-time traffic. http://i.imgur.com/uL93yPj.png My website in question is: www.ditalia.com.au Thanks. uL93yPj.png
Reporting & Analytics | | infinart0 -
Human Representation on a site
Hello Friends, thank you for helping in advance. My website http://www.FrontlineMobility.com gets a lot of traffic and with our Google Adwords campaigns we have very good click through rates(percentages from 1 percent to 4 percent). So I know that I am getting people to my site, but I can't get them to spend money. It seems like they get there ready to buy, but something turns them away at the last moment. My Partner feels like we should put more pictures of people on the site so that people feel like there is a face to our company. I am also in agreement with this, but I would also like to know if anything else is wrong with our site that perhaps maybe another set of eyes could perceive. Thank you again Moz friends. Justin Smith Frontline Mobility
Reporting & Analytics | | FrontlineMobility0 -
Question for SEO experts!
I am checking my competition with all the metrics available and I have no clue why we are not coming up higher in ranking on Google. It's been 3 months and we've been bouncing between 9 and 7 position. One competitor just came out of nowhere with almost 600 internal links and few external links and occupied 3rd spot. I've built pretty good portfolio of links with links even coming from Google for crying out loud. My on page optimization is very good, we've even passed W3C with no errors. At least 3 of our competitors above us have less Domain Authority, Page Authority, Back links for external domains, number of external domains, etc... Not sure what is going on. We are not really trying to be the top dog in Google all the time, but it is the principle of a thing. Keyword is Laser Marking. Our domain starts with cms. Don't want to type the whole URL because Google might crawl (maybe a bit of paranoid at this point LOL) Any suggestions?
Reporting & Analytics | | DmitryP0 -
If I change the URL of a page, but the old page canonicalizes to the new, do I need to change my Analytics goals to get data?
I changed the URLs of some pages recently (because the same thing that affects the internal anchor text also affects the URL - grr...) but considered it not a big deal because even if I looked at the source code of the old URL, the canonical tag was now pointing to the new one. The question is - if I had URL destination goals set up for those URLs in Google Anlaytics, do I now have to change them? Or does Google somehow know that anyone getting to the new URL is the equivalent of someone getting to the old URL because of the canonical tag that exists on the old URL source code? I still do see goal conversions for some of the old URLs even since I changed them - but it could be that people are still somehow finding the old URL somewhere - or that Google only reindexed it a week or so after I made the change. Any light to shed? Thanks in advance, Aviva B
Reporting & Analytics | | debi_zyx0