Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dilemma about "images" folder in robots.txt
-
Hi, Hope you're doing well.
I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
-
Yup my images send me traffic from Google images on most of my sites and attractive images attract hotlinks as well. At the moment people are hosting their images on a different domain (cdn) and are still being credited with the images but I haven't tried to do that myself ie I don't know if they've set some "ownership" somewhere and somehow.
-
I recommend allowing Google to crawl those images. Google optimizes its crawl rate and once it has done a complete crawl it will understand how often to crawl certain areas of your site. My main concern would be that you are losing potential rankings and indexing from those images - if they are unique and high quality you definitely want them to index the images, understand the file names, and appropriately index them.
I wouldn't be concerned about Google bot eating up your server resources. If it does become a problem, then you can go back and adjust the bot access through the robots.txt, as you've done already. However, I would let them in first and only react if it becomes a problem.
I have tens of thousands of product images accessed by the google bot and it is no concern to my ecommerce company and the server resources. I'm not saying that it can't be a potential problem, but the benefit outweighs the risk of it being one - I choose a reactive stance in this situation.
Closely monitor your Google Webmaster Tools account, watch the crawl rate and statistics, and if it becomes an issue then decide on which image folders should or shouldn't be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Images on their own page?
Hi Mozers, We have images on their own separate pages that are then pulled onto content pages. Should the standalone pages be indexable? On the one hand, it seems good to have an image on it's own page, with it's own title. On the other hand, it may be better SEO for crawler to find the image on a content page dedicated to that topic. Unsure. Would appreciate any guidance! Yael
Intermediate & Advanced SEO | | yaelslater1 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Should I NOFOLLOW my "Add To Cart" buttons?
Hello and Merry Christmass Should I NOFOLLOW my "Add To Cart" buttons? My e-commerce site has hundreds of products. Content wise, there is no real value to the reader on that page (besides for some testimonials and "why here" sentences). So it is not a page you'd want / expect to find in the SERPs. Also, with hundreds of links pointing to this page it would be "stronger" than other important pages which doesn't make sense. Last but not least, if I have limited time that the bots are on my site, why keep sending them to a non important page. This is why I am leaning to nofollowing the "add to cart" buttons and looking for reinforcements. Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Do I need to use rel="canonical" on pages with no external links?
I know having rel="canonical" for each page on my website is not a bad practice... but how necessary is it for pages that don't have any external links pointing to them? I have my own opinions on this, to be fair - but I'd love to get a consensus before I start trying to customize which URLs have/don't have it included. Thank you.
Intermediate & Advanced SEO | | Netrepid0 -
Is it better "nofollow" or "follow" links to external social pages?
Hello, I have four outbound links from my site home page taking users to join us on our social Network pages (Twitter, FB, YT and Google+). if you look at my site home page, you can find those 4 links as 4 large buttons on the right column of the page: http://www.virtualsheetmusic.com/ Here is my question: do you think it is better for me to add the rel="nofollow" directive to those 4 links or allow Google to follow? From a PR prospective, I am sure that would be better to apply the nofollow tag, but I would like Google to understand that we have a presence on those 4 social channels and to make clearly a correlation between our official website and our official social channels (and then to let Google understand that our social channels are legitimate and related to us), but I am afraid the nofollow directive could prevent that. What's the best move in this case? What do you suggest to do? Maybe the nofollow is irrelevant to allow Google to correlate our website to our legitimate social channels, but I am not sure about that. Any suggestions are very welcome. Thank you in advance!
Intermediate & Advanced SEO | | fablau9 -
Is it a bad idea to have a "press" page and link to press mentions of our company?
We've recently been getting quite a bit of press. Would it be wise to create a "press" page and link to mentions of us or would this devalue the links on the press pages as Google may think they reciprocal?
Intermediate & Advanced SEO | | JenniferDacosta0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560