Stop google indexing CDN pages

loopyal

Just when I thought I'd seen it all, google hits me with another nasty surprise!

I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?

Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.

Anyone got an idea on how to stop that?

Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.

It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?

Have you seen this problem and beat it?

(Of course the next thing is Roger might look at google results and start crawling them too, LOL)

P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.

loopyal

Thank you Edward.

I don't have quite that problem, but I think you are right too.

My CDN is set up to be Origin Pull.

That means there is no need to FTP - the system just fetches content as requested.

you should check that out if you have to ftp everything.

But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.

I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.

I do have a few other types of pages, so I will handle the code for them next.

I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.

edwardlewis

It sounds like you have set up your CDN slightly wrong.

After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets

I imagine you have your origin directory for the CDN in the public html folder.

Create a subdomain, set that as the origin.

Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/

I have a subdomain called assets: http://assets.looksfishy.co.uk/

The cdn content: http://cdn.looksfishy.co.uk/

Files uploaded here:

http://assets.looksfishy.co.uk/species/holder/pike.jpg

Displayed here:

http://cdn.looksfishy.co.uk/species/holder/pike.jpg

Check the ip address on them.

It does make uploading images by ftp a bit of a faff, but does make your site better

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Stop google indexing CDN pages

Got a burning SEO question?

Explore more categories

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved