Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting a ton of "not found" errors in Webmaster tools stemming from /plugins/feedback.php
So recently Webmaster tools showed a million "not found" errors with the url "plugins/feedback.php/blah blah blah." A little googling helped me find that this comes from the Facebook comment box plugin. Apparently some changes recently have made this start happening. The question is, what's the right fix? The thread I was reading suggested adding "Disallow: /plugins/feedback.php" to the robots.txt file and marking them all fixed. Any ideas?
Technical SEO | | cbrant7770 -
URLs with dashes between words or nothing at all? ( ../some-content vs. ../somecontent)
Here's a quick and easy question: Is there any problem with not using dashes in between words for URLs? Obviously the readability factor is a concern, but from a search engine standpoint? Thanks in advance!
Technical SEO | | tbinga0 -
Website Hierarchy Question / Discussion
Hey all, I am looking to get the opinions off the community to help settle a discussion / debate. We are looking at how a site is laid out and which is the preferred method. There are two options: www.site.com --> /category-page --> /product-page (With this option, you always have the domain name and then page, no matter where in the site you actually are, and how many clicks it took you to get there). Your URL to the end page here would be www.site.com/product-page www.site.com --> /category-page --> /category-page/product-page --> (With this option, you into a defined structure). Your URL to the end page here would be www.site.com/category-page/product-page If you have a moment, I would be interested to know your views on which you would consider to be your preferred method and why. Thanks, Andy
Technical SEO | | Andy.Drinkwater0 -
Why is my crawl taking so long?
Hi There, My crawl for albertcuyp.nl is taking very long, it started on the 10th of april. I don't know whats going on but i think 2 weeks for a crawl is extremely long. Can you help me?
Technical SEO | | KnowHowww0 -
I cannot find a way to implement to the 2 Link method as shown in this post: http://searchengineland.com/the-definitive-guide-to-google-authorship-markup-123218
Did Google stop offering the 2 link method of verification for Authorship? See this post below: http://searchengineland.com/the-definitive-guide-to-google-authorship-markup-123218 And see this: http://www.seomoz.org/blog/using-passive-link-building-to-build-links-with-no-budget In both articles the authors talk about how to set up Authorship snippets for posts on blogs where they have no bio page and no email verification just by linking directly from the content to their Google+ profile and then by linking the from the the Google+ profile page (in the Contributor to section) to the blog home page. But this does not work no matter how many ways I trie it. Did Google stop offering this method?
Technical SEO | | jeff.interactive0 -
Best way to create a shareable dynamic infographic - Embed / Iframe / other?
Hi all, After searching around, there doesn't seem to be any clear agreement in the SEO community of the best way to implement a shareable dynamic infographic for other people to put into their site. i.e. That will pass credit for the links to the original site. Consider the following example for the web application that we are putting the finishing touches on: The underlying site has a number of content pages that we want to rank for. We have created a number of infogrpahics showing data overlayed on top of a google map. The data continuously changes and there are javascript files that have to load in order to achieve the interactivity. There is one infographic per page on our site and there is a link at the bottom of the infographic that deep links back to each specific page on our site. What is the ideal way to implement this infographic so that the maximum SEO value is passed back to our site through the links? In our development version we have copied the youtube approach implemented this as an iframe. e.g. <iframe height="360" width="640" src="http://www.tbd.com/embed/golf" frameborder="0"></iframe>. The link at the bottom of that then links to http://www.tbd.com/golf This is the same approach that Youtube uses, however I'm nervous that the value of the link wont pass from the sites that are using the infographic. Should we do this as an embed object instead, or some other method? Thanks in advance for your help. James
Technical SEO | | jtriggs0 -
External Microsite VS Internal Folder
We would like to create either a new website or a new section of our existing website that will feature (in time) a lot of content including a forum, video training, tutorials and downloadable resources. Logistically, it would be much easier to create this in a new site (we'll call it newproduct.com) and refer people to the new site. We would, however, like to keep all of that content on our existing site for the sake of content building and SEO. Should we: Duplicate the content and use no index no follow and/or rel canonical? Host all of the content on our site and set up a vanity domain (www.newproduct.com) to point people to the deep linked area (www.mainsite.com/product/newproductinfo)? Host the content only on an external site with the occasional link back to our main site? I realize there are other options but they're mostly variants of the above. Our main objectives are to make it easy for people to get to while leveraging the new content for SEO purposes. What are the pros and cons of these different approaches? What seems to make the most sense? Thank you!
Technical SEO | | BeijerElectronics0 -
Google crawl index issue with our website...
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server. In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder. An example: Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky. We called our hosting company who said the problem isn't coming from them... Has anyone had something like this happen to them? Thanks so much for your insight!
Technical SEO | | ILM_Marketing
Megan0