Index pdf files but redirecto to site
-
Hi,
One of our clients has tons of PDFs (manuals, etc.) and frequently gets good rankings for the direct PDF link. While we're happy about the PDFs attracting users' attention, we'd like to redirect them to the site where the original PDF link is published and avoid that people open the pdf directly.
In short, we'd like to index the PDFs, but show to users the pdf link within a site - how should we proceed to do that?
Thanks,
GM
-
Thanks for the follow-up ... if it weren't for phrases like
- The page displayed to all users who visit from Google must be identical to the content that is shown to Googlebot.
I'd be quite comfortable with that ... in the meantime, however, I might try some pdf2html conversion tools to see if there is a viable way to present PDF-information on a HTML page and block the PDF link for robots.
Regards,
Gert
-
Hi Gret,
After further research, it might not be considered as cloacking that much as the Google First Click Free for Web Search system works the same way and check the HTTP referer.
For more details, read the official Google Webmaster Central blog post about it here :
http://googlewebmastercentral.blogspot.com/2008/10/first-click-free-for-web-search.htmlBest regards,
Guillaume Voyer. -
Thanks for your detailed reply, Guillaume,
I guess the possible "cloaking troubles" with this strategy are probably too risky for our project. However, I like the "click here" idea, we'll check if we can automate that somehow to drag users reading the PDFs back to our site.
-
Hi Gert,
Technically, this is not possible unless you use cloaking to display the PDF to the search engines and redirect the users to a different page.
What you could do to avoid cloacking is to include a banner at the top of your PDF with something like "Click here to see all our related PDFs" that would link to your website, this way users might be interested in going to your website.
Otherwise, you could detect the referer with htaccess and redirect the user to the user if he is coming from google, but this might be considered as cloaking. Here's an example :
RewriteEngine On
RewriteCond %{HTTP_REFERER} (.)google.(.)
RewriteRule ^pdf/(.*).pdf /pdf-list [R=302]If you are running a apache server and you put this in your .htaccess file, the first line activate mod_rewrite, the second line check if the referer matches anythinggoogle.anything and the third line redirect all .pdf files in the pdf folder to the /pdf-list page if the referer matches.
Best regards,
Guillaume Voyer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap - 200 out of 2100 pages indexed
I submitted the .xml sitemap in Google Webmaster Tools and only 200 out of 2100 pages were indexed.
Content Development | | Madlena
Why is that and what can I do ?0 -
Renaming web pages vs new web site
I am struggling with renaming a lot of my web pages because I used short form acronyms vs long form keyword page names and now my pages aren't ranking where they should be and used to be. I am weighing a whole new web site or just a massive update with new page names. I also have an old domain that 301's to the new url but the old one outranks the new one. If you search google for cheap tubes the first domain you see is www.cheaptubesinc.com (the 301'd version) when the real url is www.cheaptubes.com. I know I am getting a duplicate content penalty and when moz crawls my site they see 2X the page that I really have. I tried fixing this with canonical tags but it only helped 5 pages according the moz crawls since doing them. Since last July 4th my business has been declining and I know there was an SEO algorithm update last July 4th. I think either method of renaming the web pages with better SEO for instance cheaptubes.com/single-wall-carbon-nanotubes.htm vs cheaptubes.com/swnts.htm as it is currently. In either case, it is still an HTML 2 website done on frontpage and the question I keep asking myself is if I should just scrap the whole site and start over with a more modern format. Should I try to get a new site together with good SEO and publish it quickly vs rename and 301 a bunch of pages? What about the old site? Do I need to track the old page names and 301 them to the new ones? Any help is appreciates Mike
Content Development | | cheaptubes0 -
Guest blog on my web site.
**I received this email from a lady who wishes to write articles and post them on my site under my news section . Ok, if its quality I dont mind hiring somebody to create a post. Her proposal is as follows and this is her email :-**Basically what I can offer is to write a couple of articles for your News section, something fun and interesting for your visitors which will hopefully drag some traffic your way. I could make them well suited to your site and I could include in each a link to a client of mine - one who wants to be exposed on a good site like yours - and for doing that I can offer you compensation of £33 for each client link - 1 per article. For example, one client is Watches of Switzerland, so I could write an article about ideal wedding gifts for a groom maybe, or something about a perfect Honeymoon destination like Switzerland, and slip a link in there. Other clients include Weddingsite and Lampcommerce - which could be included in something about making a matrimonial home. There are a few stipulations I would need to abide by, like - the article would need to be 500 words, it would need the link to be a 'do follow' link, it would need a picture or two, and it would need a couple of 'sacrifice links' (just links to Wikipedia or something to make it more Google friendly). Question. Is this what a guest article is ? and also is the format ok ? Sorry if this seems a dumb question but still learning guys . King regards to everyone Peter
Content Development | | weddingshoesandaccessories1 -
How to best host a blog - standalone or on the site?
HI We are redesiging our site currently - at the moment we have a low key blog on the site which is pretty well hidden (to be changed with redesign) but is hosted as brand.com/blog name. I have been advised it might be a good idea on relaunch to have the blog as a standalone blog linking back to our main site but also having a graphic on the site that promotes the blog. The blog does not currently have the same name as the website so would work as a standalone site but I understand it would not have any seo benefits from the original site. I have seen on previous posts here that the best practise is domain.com/blog but wondered if anyone thought different? Hope someone can point me in the right direction. Thanks, Penny
Content Development | | Pday0 -
In Index but not in Serps
Hi, I have a situation with a client site which is quite frustrating. Basically, most "recent" (by that I mean for the last couple of months) blog posts are failing to reach the SERPS (actually, one has and a couple have from the early days but it's taken months for them to arrive). Previously the blog posts were indexed very quickly - often instantly. Now, I've checked WMT etc and I've submitted each post manually but still nothing. The Sitemap is valid etc. However, pages (not blog posts) seem to be getting into the serps very quickly. Another complication is that if I search: site:www.domainname.com and set the date filter to a month I can see some of the earlier blog posts in that result set. However, if I scrape a bit of unique content from one of those posts and search - nothing in the SERPS. And my Moz report tells me that the page is not to be found in the top 50 either (so I'm confident these pages are not in the SERPS). Any ideas why this would happen to just blog posts? Is it something to do with the parent blog landing perhaps being too strong in the rankings? Any ideas appreciated. Thanks.
Content Development | | KMUK0 -
Should a business blog be on a separate site or on the ecommerce site itself?
Hey there. I'm a new Pro member and this will be my first question on the Q&A. Thanks in advance for your responses. I'm the owner of an ecommerce site that sells custom candles. www.prometheancandle.com in case anyone wants to take a peak. I've become somewhat of an expert on all-things-candles over the past 4 years and I am thinking about starting a candle related blog. My question is this. Should I build this blog on the ecommerce site itself, say @ www.prometheancandle.com/blog.php, or should I devote a separate site to answering candle related question, history of candles, etc? At first, I was thinking that the blog should remain on the ecommerce site so readers would have easy access to the shop to be able to purchase products. But then it occurred to me that people who may be interested in reading up on candle history, candle making, meditation & candles, etc., may not want to go to an obviously ecommerce site to do that. I know Google values informational sites more than ecommerce sites (at least I think they do), so that encourages me to lean towards the separate site. Well, I may have just answered this question myself, but I'd definitely be interested to hear feedback and opinions. Thanks so much guys and I look forward to hearing from you.
Content Development | | Devynn0 -
Press Releases and Duplicate Content on Event Related Site
I have a site that lists events. I ask those submitting events to submit original content if possible, but frequently they submit press releases which are already published elsewhere. I rewrite some of the press releases, but do not have time to rewrite every press release that comes my way. I want my users to get a comprehensive list of events, but I don't want get a penalty for duplicate content. What is the best solution?
Content Development | | andywozhere0 -
Please help me stop google indexing https pages on my wordpress site
I added SSL to my wordpress blog because that was the only way to get a dedicated IP address for my site at my host. Now I am noticing Google has started indexing posts both as http and https. Can some one please help how to force google not to index https as I am sure its like having duplicate content. All help is appreciated. So far I have added this to top of htaccess file: RewriteEngine on Options +FollowSymlinks RewriteCond %{SERVER_PORT} ^443$ RewriteRule ^robots.txt$ robots_ssl.txt And added robots_ssl.txt with following: User-agent: Googlebot Disallow: / User-agent: * Disallow: / But https pages are still being indexed. Please help.
Content Development | | rookie1230