Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content/Similar Pages
Hello, I'm working on our site and I'm coming into an issue with the duplicate content. Our company manufactures heavy-duty mobile lifts. We have two main lifts. They are the same, except for capacity. We want to keep the format similar and the owner of the company wants each lift to have its own dedicated page. Obviously, since the layout is the same and content is similar I'm getting the duplicate content issue. We also have a section of our accessories and a section of our parts. Each of these sections have individual pages for the accessory/part. Again, the pages are laid out in a similar fashion to keep the cohesiveness, and the content is different, however similar. Meaning different terminology, part numbers, stock numbers, etc., but the overall wording is similar. What can I do to combat these issues? I think our ratings are dropping due to the duplicate content.
Technical SEO | | slecinc0 -
Crawl depth and www
I've run a crawl on a popular amphibian based tool, just wanted to confirm... should http://www.homepage be at crawl depth 0 or 1? The audit shows http://homepage at level 0 and http://www.homepage at level 1 through a redirect. Thanks
Technical SEO | | Focus-Online-Management0 -
Image Search
Hello Community, I have been reading and researching about image search and trying to find patterns within the results but unfortunately I could not get to a conclusion on 2 matters. Hopefully this community would have the answers I am searching for. 1) Watermarked Images (To remove or not to remove watermark from photos) I see a lot of confusion on this subject and am pretty much confused myself. Although it might be true that watermarked photos do not cause a punishment, it sure does not seem to help. At least in my industry and on a bunch of different random queries I have made, watermarked images are hard to come by on Google's images results. Usually the first results do not have any watermarks. I have read online that Google takes into account user behavior and most users prefer images with no watermark. But again, it is something "I have read online" so I don't have any proof. I would love to have further clarification and, if possible, a definite guide on how to improve my image results. 2) Multiple nested folders (Folder depth) Due to speed concerns our tech guys are using 1 image per folder and created a convoluted folder structure where the photos are actually 9 levels deep. Most of our competition and many small Wordpress blogs outrank us on Google images and on ALL INSTANCES I have checked, their photos are 3, 4 or 5 levels deep. Never inside 9 nested folders.
Technical SEO | | Koki.Mourao
So... A) Should I consider removing the watermark - which is not that intrusive but is visible?
B) Should I try to simplify the folder structure for my photos? Thank you0 -
All of my Image Looks like as a Link
Hey guyz,
Technical SEO | | atakala
I know I ask so many questions these days 😄 but you know ...
Anyway;
I have a website which has only 22 pages and today I try to crawl it with screaming frog and then I face with many image internal link.
But the problem is I dont see any of them when look at the site and source code of the site.
So why does it happen ?
Is there any idea ?
Please share with me.
Thank you .
http://i.imgur.com/dtj7Ws7.png dtj7Ws7.png0 -
Salvaging links from WMT “Crawl Errors” list?
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file. But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with. First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way: RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR] RewriteCond %{HTTP_HOST} ^www.mydomain.com$ RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L] But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these: www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/ How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo? Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these: www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com Your help is greatly appreciated. Thanks! Greg
Technical SEO | | GregB1230 -
Need some help Pagerank N/A
Hi All! We have a small e commerce site, www.ruggedpcstore.com, built on Wordpress. The home page has a PR of 3, our About page is a PR 0, and all other pages are PR N/A. We've been pulling our hair out trying to fix what's wrong. Any ideas?
Technical SEO | | CsmBill0 -
Jquery image - hidden text?
I'm working on a site that has a jquery image rotation. Wondering if how the developer set it up is consider blackhat or spammy at all. The jquery has 3 images that rotate. Each image has text in it - which is then placed in a H1 or H2 tag behind the images. When viewing the site with images and javascript turned off it looks like the text is the same color as the background and not lined up nicely so that it is visible to anyone who has images and javascript turned off. Let me know if this is a bad practice. If so, what is the best practice to handle this? If the text were another color and aligned neatly visible behind the image would it be safe? Should we just be using an image alt tag instead? What about losing the H1, H2 power then? Any other suggestions for improving the SEO for jquery image rotations where important text appears? I can PM the site URL if you want to take a look. Thanks in advance!
Technical SEO | | IvieDigital0 -
Blocking AJAX Content from being crawled
Our website has some pages with content shared from a third party provider and we use AJAX as our implementation. We dont want Google to crawl the third party's content but we do want them to crawl and index the rest of the web page. However, In light of Google's recent announcement about more effectively indexing google, I have some concern that we are at risk for that content to be indexed. I have thought about x-robots but have concern about implementing it on the pages because of a potential risk in Google not indexing the whole page. These pages get significant traffic for the website, and I cant risk. Thanks, Phil
Technical SEO | | AU-SEO0