Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Cache
So, when I gain a link I always check to see if the page that is linking is in the Google cache. I've noticed recently that more and more pages are actually not showing up in Google's cache, yet still appear in search results. I did read an article from someone whoo works at Google a few weeks back that there is sometimes an error with the cache and occasionally the cache will not display. This week, my own website isn't showing up in the cache yet I'm still ranking in SERP's. I'm not worried about it, mostly whitehat, but has there been any indication that Google are phasing out the ability to check cache's of websites?
Algorithm Updates | | ThorUK0 -
A page will not be indexed if published without linking from anywhere?
Hi all, I have noticed one page from our competitors' website which has been hardly linked from one internal page. I just would like to know if the page not linked anywhere get indexed by Google or not? Will it be found by Google? What if a page not linked internally but go some backlinks from other websites? Thanks
Algorithm Updates | | vtmoz0 -
Best and easiest Google Depersonalization method
Hello, Moz hasn't written anything about depersonalization for years. This article has methods, but I don't know if they are valid anymore. What's an easy, effective way to depersonalize Google search these days? I would just log out of Google, but that shows different ranking results than Moz's rank tracker for one of our main keywords, so I don't know if that method is correct. Thanks
Algorithm Updates | | BobGW0 -
How long for google to de-index old pages on my site?
I launched my redesigned website 4 days ago. I submitted a new site map, as well as submitted it to index in search console (google webmasters). I see that when I google my site, My new open graph settings are coming up correct. Still, a lot of my old site pages are definitely still indexed within google. How long will it take for google to drop off or "de-index" my old pages? Due to the way I restructured my website, a lot of the items are no longer available on my site. This is on purpose. I'm a graphic designer, and with the new change, I removed many old portfolio items, as well as any references to web design since I will no longer offering that service. My site is the following:
Algorithm Updates | | rubennunez
http://studio35design.com0 -
Does Google use dateModified or date Published in its SERPs?
I was curious as to the prioritization of dateCreated / datePublished and dateModified in our microdata and how it affects google search results. I have read some entries online that say Google prioritizes dateModified in SERPs, but others that claim they prioritize datePublished or dateCreated. Do you know (or could you point me to some resources) as to whether Google uses dateModified or date Published in its SERPs? Thanks!
Algorithm Updates | | Parse.ly0 -
How To Index Backlinks Easily?
I have already pinged my backlinks, While pinging individual urls but all the same backlinks are not indexed. How to index my backlinks?
Algorithm Updates | | surabhi60 -
Rankings changing every couple of MINUTES in Google?
We've been experiencing some unusual behaviour in the Google.co.uk SERPs recently... Basically, the ranking of some of our websites for certain keywords appears to be changing by the minute. For example, doing a search for "our keyword" might show us at #20. Then a few minutes later, doing the same search shows us at #14, and then the same search a few minutes later shows us at #26, and then sometimes we're not ranked at all, etc etc. I know the algorithm changes a lot, but does it really change every couple of minutes? Has anyone else experienced this kind of behaviour in the SERPs? What could be causing it to happen?
Algorithm Updates | | d4online0 -
Using Brand Name in Page titles
Is it a good practice to append our brand name at the end of every page title? We have a very strong brand name but it is also long. Right now what we are doing is saying: Product Name | Long brand name here Product Category | Long brand name here Is this the right way to do it or should we just be going with ONLY the product and category names in our page titles? Right now we often exceed the 70 character recommendation limit.
Algorithm Updates | | mlentner1