Amazon CloudFront CDN
-
Hi,
I'd like to increase website's speed with Amazon CloudFront CDN.
I created some CNAMEs and i've something like this:
- www.mydomain.com (my website)
- cdn1.mydomain.com
- cdn2..mydomain.com
- cdn3.mydomain.com
But i've a lot duplicate content now ! One per subdomain and one per content (gif, css, html, and so one).
Have you any feedback in order to not have SEO penalty ?
Does Google detects CDN ? Can I help him to understand my CDNs ?
Thanks,
Best regards,
Maxime
-
Hi Max,
As you know, SEOmoz uses a CDN (Content Delivery Network) to host our static content. This greatly improves the load time of our pages by distributing our content across a cloud network, and results in an improved experience for users.
If I understand your question correctly, you have set up a CDN and have created duplicate content issues.
To solve this, it's important to set up your CDN only to serve static content, like images, stylesheets and javascript. That is what a CDN is designed for. Do not duplicate your entire site - your HTML - as this will cause duplicate content issues.
If for some reason you need to replicate your entire HTML, then there are some steps you can take to mitigate the damage, although it's going to depend on your exact circumstances.
For example, you can set full URL canonical tags so that all your mapped CNAMES point to your primary URL.
To revert back to one copy of your HTML, you might want to put 301 redirects in place on the duplicated content (pointing to the original) before removing them from the CDN.
But even these aren't ideal solutions. It's best just to serve your static content, and only one version of your HTML.
-
I think he didn't reply.
He store data onto Amazon S3 and serves pictures from CDN (Amazon CloudFront). So he told me he hasn't duplicate content issues because he serves pictures.
But he tolds too "This isn't an issue for duplicate content, unlike if you were replicating your HTML".
When you use Amazon CloudFront without Amazon S3, but you use it with your webserver, Amazon CloudFront duplicates all content (pictures, pages, ...).
Onto your website, you'll only link pictures to CDN, for example http://cdn1.test.com/picture.jpg. But if GoogleBot opens http://cdn1.test.com/ it'll find all your html content !
So it'll be a duplicate issue I think, and I don't really know what is the best way to fix that (not use Amazon CloudFront without Amazon S3, Canonical, http headers, ...)
Thanks
-
Did the author's reply in the comment of the blog post answer your question, or do you still have this question?
-
Great post, but he didn't talk about duplicate content, only increasing speed.
-
Here's the YouMoz post that might help.
http://www.seomoz.org/ugc/improving-page-speed-with-amazon-web-services-a-beginners-guide
-
Tomorrow morning (Seattle time) I'll be posting a YouMoz blog post at http://www.seomoz.org/ugc that deals directly with setting up a CDN on Amazon. You can read through the steps given in the article and see if that answers your questions, and if not, you can ask a question in the comments.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is Amazon crawling my website? Is this hurting us?
Hi mozzers, I discovered that Amazon is crawling our site and exploring thousands of profile pages. In a single day it crawled 75k profile pages. Is this related to AWS? Is this something we should worry about or not? If so what could be a solution to counter this? Could this affect our Google Analytics organic traffic?
Intermediate & Advanced SEO | | Ty19860 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed.
Intermediate & Advanced SEO | | alphonsehaThe above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!?
Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
0 -
We used to speak of too many links from same C block as bad, have CDN's like CloudFlare made that concept irrelevant?
Over lunch with our head of development, we were discussing the way CloudFlare and other CDN's help prevent DDOS attacks, etc. and I began to wonder about the IP address vs. the reverse proxy IP address. Before we would look to see commonalities in the IP as a way that search engines would modify the value to given links and most link software showed this. For ahrefs, I know they still show common IPs using the C block as the reference point. I began to get curious about what was the real IP when our head of dev said, that is the IP from CloudFlare... So, I ran a site in ahrefs and we got an older site we had developed years ago that showed up as follows: Actos-lawsuit.org 104.28.13.57 and again as 104.28.12.57 (duplicate C block is first three sets of numbers are the same and obviously, this has a .12 and a .13 so not duplicate.) Then we looked at our host to see what was the IP shown there: 104.239.226.120. So, this really begs a question of is C Block data or even IP address data still relevant with regard to links? What do the search engines see when they look for IP address now? Yes, I have an opinion, but would love to hear yours first!
Intermediate & Advanced SEO | | RobertFisher0 -
Can we use webiste content to Marketplce websites (Etsy / Amazon etc..)?
Hello Webmasters, My Name is Dinesh. I am working with Commerce Pundit as Marketing Person. We have one question with one of the website and would like to get the more idea on it We have one page or category name with "Engraved Photos on Wood". Here is page URL: http://www.canvaschamp.com/engraved-photos-on-wood-plaques So my Question about the content which we have added on this page. We have another team and they are handling marketplace department and they are using same content from the above page of website to do listing onto below Marketplace website. Refer website listing which are done by our marketplace team and where you can see that they guys have use the same content of form the above website page as a product info or description of the listing. https://www.etsy.com/listing/237807419/personalized-photo-art-or-custom-text-on?ref=listings_manager_grid
Intermediate & Advanced SEO | | CommercePundit
http://www.amazon.in/dp/B01003REIC
http://www.amazon.in/dp/B010037IEM
http://www.amazon.in/dp/B01000JG6I
http://www.amazon.in/dp/B01003HT6Y Does it create Duplicate content Issue with the our Website? Can marketplace team use the our website content onto various marketplace website to do website? We are every serious with the Organic Ranking for our website. So do let me know that is this right way or do we have to ask to them to stop this activities? Waiting for your reply Thanks
Dinesh
Commerce Pundit0 -
Using a US CDN (Cloudflare) for a UK Site. Should I use a UK Based CDN as it says my server is based in USA
Hi All, We are a UK Company with Uk customers only and use CloudFlare CND. Our Site is hosted by a UK company with servers here but from looking online and checking where my site is hosted etc etc , some sites are telling me the name of our UK Hosted company and other sites are telling me my site is hosted in San Fran (USA) , where I presume the Cloudflare is based. I know Cloudflare has a couple of servers in the UK it uses but given all my customers are UK based ,I don't want this is affect rankings etc , as I thought it was a ranking benefit to be hosted in the country you are based. Is there any issue with this and should I change or is google clever enough to know so i shouldn't worry. thanks Pet
Intermediate & Advanced SEO | | PeteC120 -
Ecommerce question - Should I use a CDN for my images. ?
Hi , We are currently in the process of re-developing out commerce website and I wondering should we use a CDN (content delivery nertwork) for our product images. My category pages are currently showing approx 21 product images per page and the page speed is okay but can be better but the page size is rather large ... anything between 600kb - 1 Meg. We do optimise the images already in photoshop. We also do things like minify etc to get the pages to load as fast as possible but I think the only thing left is using a CDN but I have heard mixed reports about using this.? We are also doing a mobile responsive version of the site to but I know that speed will be king with google and how it reflects on rankings. Whilst I can see a CDN will improve image page load speed etc, I guess there a negative SEO impact as well as images will be stored in the cloud ?.. as opposed on to on my site/database. Does anyone know how best to implement a CDN without impacting on SEO or know of any good SEO /implementation articles on this ?... Maybe do Ieave some images on my category pages so I can still do the alt image tags etc/ and have the remaining images on the CDN.? Many Thanks Sarah
Intermediate & Advanced SEO | | SarahCollins0 -
Ever had a case where publication of products & descriptions in ebay or amazon caused Panda penalty?
One of our shops got a Panda penalty back in september. We sell all our items with same product name and same product description also on amazon.com , amazon.co.uk, ebay.com and ebay.co.uk. Did you ever have a case where such multichannel sales caused panda penalty?
Intermediate & Advanced SEO | | lcourse0 -
How was cdn.seomoz.org configured?
The SEOmoz CDN appears to have a "pull zone" that is set to the root of the domain, such that any static file can be addressed from either subdomain: http://www.seomoz.org/q/moz_nav_assets/images/logo.png http://cdn.seomoz.org/q/moz_nav_assets/images/logo.png The risk of this configuration is that web pages (not just images/CSS/JS) also get cached and served by the CDN. I won't put the URL here for fear of Google indexing it, but if you replace the 'www' in the URL below with 'cdn', you'll see a cached copy of the original: http://www.seomoz.org/ugc/the-greatest-attribution-ever-graphed The worst-case scenario is that the homepage gets indexed. But this doesn't happen here: http://cdn.seomoz.org/ That URL issues a 301 redirect back to the canonical www subdomain. As it should. Here's my question: how was that done? Because maxcdn.com can't do it. If you set a "pull zone" to your entire domain, they'll cache your homepage and everything else. googlebot has a field day with that; it will reindex your entire site off the CDN. Maybe the SEOmoz CDN provider (CloudFront) allows specific URLs to be blocked? Or do you detect the CloudFront IPs and serve them a 301 (which they'd proxy out to anyone requesting cdn.seomoz.org)? One solution is to create a pull zone that points to a folder, like example.com/images... but this doesn't help a complex site that has cacheable content in multiple places (do you Wordpress users really store ALL your static content under /wp-content/ ?). Or, as suggested above, dynamically detect requests from the CDN's proxy servers, and give them a 301 for any HTML-page request. This gets complex quickly, and is both prone to breakage and very difficult to regression-test. Properly retrofitting a complex site to use a CDN, without creating a half-dozen new CDN subdomains, does not appear to be easy.
Intermediate & Advanced SEO | | mcglynn0