Images on sub domain fed from CDN
-
I have a client that uses a CDN to fill images, from a sub domain ( images.domain.com). We've made sure that the sub domain itself is not blocked. We've added a robots.txt file, we're creating an image sitemap file & we've verified ownership of the domain within GWT.
Yet, any crawler that I use only see's the first page of the sub domain (which is .html) but none of the subsequent URL's which are all .jpeg.
Is there something simple I'm missing here?
-
Alphonse it sounded like they were just waiting for the sitemap to launch. Other than that, I couldn't think of anything else to add because the sitemap should solve their issue. However, I have marked this as "Discussion" again.
-
I am a little confused. The question was marked answered, but which one is the answer?
-
We have the same issue however we have image XML sitemaps on each country subdomain's XML Index which point to the image files on images.domain.com.
Example:
https://uk.domain.com/image-sitemap1.xml
https://us.domain.com/image-sitemap1.xml
These 2 files are the same.
We also don't have a homepage on images.domain.com and it currently responds with a 404.
Do you think we need to create a landing page on the homepage and host the image XML sitemap at https://images.domain.com/images-sitemap1.xml rather than in each sub-domain?
Thanks.
-
Yes, we are doing everything correctly, aside from waiting for IT department to create a sitemap.
-
Are you using your own subdomain or one somewhere else (e.g. akamai.com)? You should use your own subdomain, if possible.
Was this a change from a previous version that didn't use a CDN? If those images were/are hosted on your primary domain be sure to match the filenames and paths as closely as possible to what they were before.
If you're doing that you shouldn't have a problem once the sitemap is submitted.
For more information please check out this post:
http://www.goinflow.com/four-seo-best-practices-for-using-a-content-delivery-network-cdn/How do you know that Google only attempts to crawl the primary domain URL (i.e. the .html page)? Are you checking log files?
Is the crawler you're using set to crawl external URLs? If not, that could be the issue. Technically a subdomain is a totally separate website so most tools don't crawl them by default.
-
We've correctly applied the CNAME directive from the CDN to reflect the subdomain. Yet, when Google or any other tool attempts to crawl it only shows ONE URL. Not the images that are residing on their own independent URL's.
-
In order to put those image URLs for the crawler to be able to access them you should either:
- Link to the URLs of the images (does that .html page in the subdomain contain these URLs?)
or
- Use the images URLs as resources in the pages already been crawled. Unfortunately this could be tricky when dealing with CDNs since those resources are dynamic.
In either case, the sitemap will solve your problem.
-
The sitemap is not completed yet. Server logs show Googlebot only indexing one page the .html page, not other pages.
-
Did you reference the sitemap in the robots.txt file or did you set up it in GWT?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Technical guide for Setting up a CDN to host our images, as well as creating an image sitemap, and setting up the CDN in GWT?
Hi All! We're thinking of setting up a CDN to host our images with a CNAME on a subdomain of our site. In terms of SEO, I was wondering if any of you knew of a pretty complete technical guide for setting it all up. Including whether or not we need to create an image sitemap, and setting it up in GWT. Thanks in advance! Vince
Technical SEO | | jbrisebois0 -
How would I make an image sitemap for a site with 50k images?
Is it worth creating an image sitemap? I read an image sitemap can only have 1k entries would I need to make 50 different ones?
Technical SEO | | EcommerceSite0 -
Multiple domain SEO strategy
Hi Mozzers I'm an AM at a web dev. We're building a new site for a client who sells paint to different markets: Paint for boats Paint for construction industry Paint for, well you get the idea! Would we be better off setting up separate domains - boatpaintxxx.com, housepaintxxx.com, etc - and treat each as a searate microsites for standalone SEO activity or have them as individual pages/sub doms from a single domain - paints4all.com or something? From what i've read today, including the excellent Beginners Guide - I'm guessing there's no definitive answer! Feedback appreciated! Thanks.
Technical SEO | | rikmon0 -
Checkout on different domain
Is it a bad SEO move to have a your checkout process on a separate domain instead of the main domain for a ecommerce site. There is no real content on the checkout pages and they are completely new pages that are not indexed in the search engines. Do to the backend architecture it is impossibe for us to have them on the same domain. An example is this page: http://www.printingforless.com/2/Brochure-Printing.html One option we've discussed to not pass page rank on to the checkout domain by iFraming all of the links to the checkout domain. We could also move the checkout process to a subdomain instead of a new domain. Please ignore the concerns with visitors security and conversion rate. Thanks!
Technical SEO | | PrintingForLess.com0 -
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like: staging.domain.com
Technical SEO | | fthead9
User-agent: *
Disallow: / in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.0 -
Will password protecting my test sub-domain help keep the SEs from indexing it?
Hi, all. I'm working in an unfamiliar area here, so I hope someone can tell me if I'm out in left field. I am building a sub-domain called http://test.mysite.com, so that I can upload a client's still-under-construction site while working on it. When completed, it'll go up on his server, replacing his old site. Obviously, I want to ensure that it doesn't get indexed while it's on my test platform. A friend suggested that I password it with htaccess and htpasswd, since we can never be certain the SEs will obey site directives. My question is, what do you think would be the best (and hopefully, simplest) way to accomplish this? I'm no code-monkey, so "simple" is a big plus! Doc By the way, the platform will be Wordpress CMS.
Technical SEO | | Doc_Sheldon0 -
Google refuses to index our domain. Any suggestions?
A very similar question was asked previously. (http://www.seomoz.org/q/why-google-did-not-index-our-domain) We've done everything in that post (and comments) and then some. The domain is http://www.miwaterstewardship.org/ and, so far, we have: put "User-agent: * Allow: /" in the robots.txt (We recently removed the "allow" line and included a Sitemap: directive instead.) built a few hundred links from various pages including multiple links from .gov domains properly set up everything in Webmaster Tools submitted site maps (multiple times) checked the "fetch as googlebot" display in Webmaster Tools (everything looks fine) submitted a "request re-consideration" note to Google asking why we're not being indexed Webmaster Tools tells us that it's crawling the site normally and is indexing everything correctly. Yahoo! and Bing have both indexed the site with no problems and are returning results. Additionally, many of the pages on the site have PR0 which is unusual for a non-indexed site. Typically we've seen those sites have no PR at all. If anyone has any ideas about what we could do I'm all ears. We've been working on this for about a month and cannot figure this thing out. Thanks in advance for your advice.
Technical SEO | | NetvantageMarketing0 -
Seomoz api for domains working, for domains+directory not?
We're working on a tool using the seomoz api ... for domains we're always getting the right values, but for longer URLs we're having troubles ... Example: http://www.seomoz.org/blog/6-reasons-why-qa-sites-can-boost-your-seo-in-2011-despite-googles-farmer-update-12160 won't work http://www.seomoz.org/blog works Any idea what we might be doing wrong?
Technical SEO | | gmellak0