Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
PDF best practices: to get them indexed or not? Do they pass SEO value to the site?
-
All PDFs have landing pages, and the pages are already indexed. If we allow the PDFs to get indexed, then they'd be downloadable directly from google's results page and we would not get GA events.
The PDFs info would somewhat overlap with the landing pages info. Also, if we ever need to move content, we'd now have to redirects the links to the PDFs.
What are best practices in this area? To index or not?
What do you / your clients do and why?
Would a PDF indexed by google and downloaded directly via a link in the SER page pass SEO juice to the domain? What if it's on a subdomain, like when hosted by Pardot? (www1.example.com)
-
repeatedly noticed that google index PDF files. But only their headers, without the contents of the file itself.
If you format the file description correctly, you can do it through the PDF Architect (http://pdf-architect.ideaprog.download/) program, or any other convenient for you.
-
PDFs can be canonicalized using .htaccess. Google is usually very slow to discover and obey this but it can be done. However, if your PDF is not close to being an exact copy of the target page, Google will probably not honor the canonicalization and they will index the PDF and the html page separately.
PDFs can be optimized (given a title tag) by editing the properties of the document. Most PDF - making software has the ability to do this.
You can insert "buy buttons" and advertising in PDFs. Just make an image, paste it into the document and link it to your shopping cart or to your target document.
PDFs accumulate linkjuice and pass it to other documents.
Use the same strategies with PDFs as you would with an html page for directing visitors where you want them to go and getting them to do what you want them to do.
Some people will link to your PDF, others will grab your PDF and place it on their website (in that situation, you lose the canonical but still get juice from any embeded links), and benefit from ads and buttons that might be included. Lock the PFD with your PDF-creating software to prevent people from editing your PDF (but they can always copy/paste to get around it).
Other types of documents such as Excel spreadsheets, PowerPoint documents, Google images, etc can have embedded text, embedded links and other features that are close to equivalent to an html document.
-
PDF documents aren't written in HTML so you can't put canonical tags into PDFs. So that won't help or work. In-fact, if you are considering any types of tags of any kind for your PDFs, stop - because PDF files cannot have HTML tags embedded within them
If your PDF files have landing pages, just let those rank and let people download the actual PDF files from there if they chose to do so. In reality, it's best to convert all your PDFs to HTML and then give a download link to the PDF file in case people need it (in this day and age though, PDF is a backwards format. It's not even responsive, for people's pones - it sucks!)
The only canonical tags you could apply, would be on the landing pages (which do support HTML) pointing to the PDF files. Don't do that though, it's silly. Just convert the PDFs to HTML, then leave a download button for the old PDFs in-case anyone absolutely needs them. If the PDF and the HTML page contain similar info, it won't affect you very much.
What will affect you, is putting canonical tags on the landing pages thus making them non-canonical (and stopping the landing pages from ranking properly). You're in a situation where a perfect outcome isn't possible, but that's no reason to pick the worst outcome by 'over-adhering' to Google's guidelines. Sometimes people use Google's guidelines in ways Google didn't anticipate that they would
PDF documents don't usually pass PageRank at all, as far as I know
If you want to optimise the PDF documents themselves, the document title which you save them with is used in place of a <title>tag (which, since PDFs aren't in HTML, they can't use <title>). You can kind of optimise PDF documents by editing their document titles, but it's not super effective and in the end HTML conversions usually perform much better. As stated, for the old fossils who still like / need PDF, you can give them a download link</p> <p>In the case of downloadable PDF files with similar content to their connected landing pages, Google honestly don't care too much at all. Don't go nutty with canonical tags, don't stop your landing pages from ranking by making them non-canonical</p></title>
-
Yes, the PDFs would help increase your domain rank as they are practically considered as pages by Google, as explained in their QnA here.
Regarding hosting the PDFs on a subdomain, Google has stated that it's almost the same as having them on a subfolder, but that is highly contested by everyone since it's much harder to rank a subdomain than a subfolder.
Regarding the canonical tags, they are created for "Similar or Duplicate Pages", so the content doesn't have to be identical, and you'll be good so long as most of the content is the same. Otherwise, you can safely have them both be and have backlinks linking from the pdf to the main content to transfer "link juice", as they are considered as valid links.
I hope my response was beneficial to you and that the included proof was substantial.
Daniel Rika
-
Thank you.
Could you address my question about what's best practice? What do most companies do?
I am not sure what the best choice would be for us -- to expose PDFs which compete with their own landing pages or not.
Also, do you know if PDFs pass SEO "juice" to the main domain? Even if they are hosted at www2.maindomain.com?
Where can I see some proof that this is the case?
If the PDFs have a canonical tag pointing to the parent page, wouldn't this be confusing for the search engines as these are two separate files with differing content? Canonical tags are usually used to eliminate duplicates for differing URLs with identical content.
-
Whether you want to index the pdf directly or not will mostly depend on the content of the pdf:
- If you are using the pdf as a way to gather e-mails for your newsletter, or if you are offering the pdf as a way to get users to your site, then it would be best not to have them indexed directly, but instead have the users go to your site first.
- If the pdf in itself is a way for you to promote your website or content then you can index it so that it can be accessed directly and may help you to get a bit more rank or clicks.
If you are looking to track pdf views, there are options to connect GA and track your pdf views, such as this plugin.
If the content is similar to the web page, then you can put a canonical tag to transfer the ranking. You can add it to the http header using the .htaccess file as explained here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
The best checking tool Keyword Cannibalization
hi guys i have a Keyword Cannibalization isuue, please Introduce best free tools for checking Keyword Cannibalization.
Reporting & Analytics | | 1001mp30 -
Www.googleadservices.com/pagead/conversion_async.js what is this url doing on my site?
Hello Guys, I am using google tagmanager and i have configured adwords in tag manager now what i find is that this link - www.googleadservices.com/pagead/conversion_async.js showing on my homepage not in view source but when i do inspect element at that time it appears. So do you think after using google tag manager still i need to use the given link? Thanks, Raghu
Reporting & Analytics | | raghuvinder0 -
Sudden Increase In Number of Pages Indexed By Google Webmaster When No New Pages Added
Greetings MOZ Community: On June 14th Google Webmaster tools indicated an increase in the number of indexed pages, going from 676 to 851 pages. New pages had been added to the domain in the previous month. The number of pages blocked by robots increased at that time from 332 (June 1st) to 551 June 22nd), yet the number of indexed pages still increased to 851. The following changes occurred between June 5th and June 15th: -A new redesigned version of the site was launched on June 4th, with some links to social media and blog removed on some pages, but with no new URLs added. The design platform was and is Wordpress. -Google GTM code was added to the site. -An exception was made by our hosting company to ModSecurity on our server (for i-frames) to allow GTM to function. In the last ten days my web traffic has decline about 15%, however the quality of traffic has declined enormously and the number of new inquiries we get is off by around 65%. Click through rates have declined from about 2.55 pages to about 2 pages. Obviously this is not a good situation. My SEO provider, a reputable firm endorsed by MOZ, believes the extra 175 pages indexed by Google, pages that do not offer much content, may be causing the ranking decline. My developer is examining the issue. They think there may be some tie in with the installation of GTM. They are noticing an additional issue, the sites Contact Us form will not work if the GTM script is enabled. They find it curious that both issues occurred around the same time. Our domain is www.nyc-officespace-leader. Does anyone have any idea why these extra pages are appearing and how they can be removed? Anyone have experience with GTM causing issues with this? Thanks everyone!!!
Reporting & Analytics | | Kingalan1
Alan1 -
Why is Google Analytics showing index.php after every page URL?
Hi, My client's site has GA tracking code gathering correct data on the site, but the pages are listed in GA as having /index.php at the end of every URL, although this does not appear when you visit the site pages. Even if there is a redirect happening for site visitors, shouldn't GA be showing the pages as their redirect destination, i.e. the URL that visitors actually see? Could this discrepancy be adversely affecting my search performance? Example page: http://freshstarttax.com/innocent-spouse/ shows up in GA as http://freshstarttax.com/innocent-spouse/index.php thanks
Reporting & Analytics | | JMagary0 -
How to get a list of robots.txt file
This is my site. http://muslim-academy.com/ Its in wordpress.I just want to know is there any way I can get the list of blocked URL by Robots.txt In Google Webmaster its not showing up.Just giving the number of blocked URL's. Any plugin or Software to extract the list of blocked URL's.
Reporting & Analytics | | csfarnsworth0 -
Has anyone attended or hired anyone from Full Sail University? Or schools offering Internet Marketing or SEO
We are actively looking for a couple of positions. A qualified SEM (for PPC, etc) pro that has experience with Google but also others including BING and Yahoo. And a good SEO person who gets it is about delivering a result and not blowing sunshine up someone's skirt or pant leg.
Reporting & Analytics | | RobertFisher
That said I got a resume the other day from a guy and he stated he had a Master's Certificate in Internet Marketing from Full Sail University. I looked them up and they offer a Bachelors Degree and a Masters Degree in Internet Marketing. They also offer a certificate in Internet marketing they call a Master's certificate. So, I set a time for him to call me as I was mildly skeptical.
When I started asking about the certificate (His LinkedIn page had Masters Degree in IM) he said he had taken the certificate and I did not pursue the LinkedIn. I asked how long he was in school for the certificate and he stated 6 months (curriculum is 4 months and is fairly t h i n.
I asked what he had done and he had completed an Internship with a Texas company and helped a construction firm move to page one. When I asked how he started telling me about inserting keywords in the meta tag field in the CMS..... He did not know what a title tag was, knew nothing of content, H1, H2, Local Optimization and then wanted to tell me how when you put their Domain name in google they are number two on page One. So you get it, I believe. I am just really frustrated with the whole "certification" etc. thing and believe we do need it. Market Motive is as close to good as I can see, but I would love to hear from you. (Note on MM- I do not like the if you pay monthly you get Provider or whatever but it you shell out $3500 you are a Master. Just doesn't feel right to me. And, I did speak with someone a while back with a masters from them and again, I scratched my head). Would just like to hear of your experience and if anyone has hired from these schools with success.0 -
Easiest way to get out of Google local results?
Odd one this, but what's the easiest way to remove a website from the Google local listings? Would removing all the Google map listings do the job? A client of ours is suffering massively since the Google update in the middle of last month. Previously they would appear no1 or no2 in the local results and normally 1 or 2 in the organic results. However, since the middle of last month any time they rank on the first page for a local result, their organic result has dropped massively to at least page 4. If I set my location as something different in google, say 100 miles away, they then rank well for the organic listings (obviously not appearing for local searches). When I change it back to my current location the organic listing is gone and they are back to ranking for the local. Since the middle of July the traffic from search engines has dropped about 65%. All the organic rankings remain as strong as ever just not in the areas where they want to get customers from!! The idea is to remove the local listing and get the organics reranking as the ctr on those is much much higher. On a side note, anyone else notice very poor ctr on google local listings? Maybe users feel they are adverts thanks
Reporting & Analytics | | ccgale0 -
What is best practice for tracking RSS feed subscribers
What is the most accurate/achievable way of tracking data about subscribers to your RSS feed through Google Analytics? With standard WordPress sites, we place the RSS link to Feedburner so we could track statistics. However it wouldn't track the way that I use it. I use Pulse on an Android Tablet to read my feeds offline on the bus each morning. At home, Pulse automatically downloads the latest feeds wirelessly overnight. So then I can read them without a connection. The obvious downside for my reading experience is that I only get what is contained in the feeds. If the company only includes an excerpt, it's too annoying to read the teaser and be unable to connect and follow a link. So I only subscribe to feeds that contain the full post. Yeah to seomoz, aimclear, SEL, adwordsblog. I dont subscribe to bruceclays blog, much as i'd like to, because it doesn't contain the full feed. That's probably deliberate on their part, because I have to consciously visit their blog on my desktop at work, to see the whole post. The other problem with say Pulse, is how it locates the feed. I typed in the URL, and Pulse subscribed me. I assume that Pulse simply looked for the domain.com/feed URL and added that, rather than look for feeds2.feedburner.com/domain. I looked at Feedburner stats and they didn't go up for 2 days, so basically it didn't track me. Would it be as simple as using the Google URL builder to add parameters to each post in the RSS feed? Eg utm_source=feedreader, utm_medium=rss, utm_campaign=tracking. But that still wouldn't track offline users. I assume that most people are also not going to paste the Feedburner URL into their FeedReader, but would let the platform auto-detect the feed. Any suggestions?
Reporting & Analytics | | ozgeekmum1