Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
From: http://www. to https://
Hi all, I am changing my hosting for legal and SEO reasons from http://www to https:// . Now I hear different stories on the redirects: 1: should i try and change my backlinks? 2: internally all links will be 301 redirected at first. Than I want to (manually) change them. It;s within Wordpress so there should be a plugin for this. Tips? 3: Will it affect my rankings and for what period? What I now know that at first it will drop little but eventually you will rank higher than before. Thanks so much in advance! Tymen
Technical SEO | | Tymen1 -
Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?
Will blocking the Wayback Machine (archive.org) by adding the code they give have any impact on Google crawl and indexing/SEO? Anyone know? Thanks! ~Brett
Technical SEO | | BBuck0 -
Branding in a post EMD/Panda world?
How do non-business websites think about exact match domain names and branding after EMD and Panda? What is an EMD and what is a "brand" when there's no business or commercial brand involved? And does it impact your SEO outcomes? I have a Thailand travel blog which is a personal crusade of mine. I like Thailand, I love travelling there and I like sharing my experiences and knowledge. I don't have a business name, so when I started the blog I just used the best phrase I could find that was available as a domain name - at the time it was "bangkoktravelthailand.com". Late in 2011 I thought this sounded a bit spammy, so I found a new domain name "traveltipsthailand.com" and 301'd across to that. All went well and traffic grew consistently thanks to good writing and some basic SEO, until in late September 2012 the site got 'whacked' by Google - possibly due to EMD, but I think more likely due to Panda and some accidental poor quality backlinks (I posted a reply on another travel site, pointing back to my site, but it ended up becoming 100s of low value backlinks because of the way that site managed it's "latest comments" widget). Since then I've been trying very hard to rebuild my traffic, but it's a tough gig. I am now averaging better than I was in Sept 2012, but nowhere near where I was on trend to be by now. I have a small social media profile (800 Twitter followers plus Google+, Facebook and Pinterest) and I am slowly building some supporting pages on prominent Web 2.0 sites and seeking out quality guest post opportunities. But I still worry about the domain name. Does Google see it as an EMD? I don't use the domain name words at all in my page titles (I use xxx | Thailand travel blog) and I try not to use it in anchors either (I tend to use "Thailand travel blog" or my own name. But I still have a few old backlinks that say "Travel Tips Thailand" and I use that phrase as my brand when talking about the website. So how should sites like mine think about "brand" and "EMD"? Is it an issue or not? Is my domain name holding my site back? I have others I can use like "1travelthailand.com" and "thailand-travel-blog.com" but I'm just sitting on them, not sure where to go. I also have "asiantraveltips.com" and a long term view of rolling this site up with other blogs I'm slowly developing about China, Cambodia and Vietnam. But again, not sure where to go any more. Anyone care to share their thoughts?
Technical SEO | | Gavin.Atkinson0 -
Crawl Diagnostics and Duplicate Page Title
SOMOZ crawl our web site and say we have no duplicate page title but Google Webmaster Tool says we have 641 duplicate page titles, Which one is right?
Technical SEO | | iskq0 -
Affects of multiple subdomains on homebrew CDN for images
We're creating our own CDN such that instead of serving images from http://mydomain.com/images/shoe.jpg It will appear at all of the following subdomains: http://cdn1.mydomain.com/images/shoe.jpg http://cdn2.mydomain.com/images/shoe.jpg http://cdn3.mydomain.com/images/shoe.jpg http://cdn4.mydomain.com/images/shoe.jpg Image tags on our pages will randomly choose any subdomain for the src. The thought was this will make page loading faster by paralellizing requests across many cookie-less domains. How does this affect : -Ranking of images on Google image search. -Ranking of pages they appear on -Domain authority (images are linked to heavily in our social media efforts, so we will 301 redirect image urls to cdn1.mydomain.com) Should we disallow all but one CDN domain in robots.txt? Will robots.txt on an image only subdomain even be retrieved? Should we just use 1 CDN subdomain instead?
Technical SEO | | cat5com0 -
Crawl Diagnostics Report 500 erorr
How can I know what is causing my website to have 500 errors and how I locate it and fix it?
Technical SEO | | Joseph-Green-SEO0 -
Getting multiple errors for domain.com/xxxx/xxxx/feed/feed/feed/feed...
A recent SEOMoz crawl report is showing a bunch 404's and duplicate page content on pages with urls like http://domain.com/categories/about/feed/feed/feed/feed/feed and on and on. This is a wordpress install. Does anyone know what could be causing this or why SEOMoz would be trying to read these non-existent feed pages?
Technical SEO | | Brandtailers0 -
E-Commerce Site Crawling Problem
Our website displays all of the products in our website If you attempt to visit a category or page that doesn't exist but conforms to our site url structure. Somehow google crawled these pages and indexed them, and they have TONS of duplicate content that hurt us. How do I deal with this problem?
Technical SEO | | 13375auc30