Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dilemma about "images" folder in robots.txt
-
Hi, Hope you're doing well.
I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
-
Yup my images send me traffic from Google images on most of my sites and attractive images attract hotlinks as well. At the moment people are hosting their images on a different domain (cdn) and are still being credited with the images but I haven't tried to do that myself ie I don't know if they've set some "ownership" somewhere and somehow.
-
I recommend allowing Google to crawl those images. Google optimizes its crawl rate and once it has done a complete crawl it will understand how often to crawl certain areas of your site. My main concern would be that you are losing potential rankings and indexing from those images - if they are unique and high quality you definitely want them to index the images, understand the file names, and appropriately index them.
I wouldn't be concerned about Google bot eating up your server resources. If it does become a problem, then you can go back and adjust the bot access through the robots.txt, as you've done already. However, I would let them in first and only react if it becomes a problem.
I have tens of thousands of product images accessed by the google bot and it is no concern to my ecommerce company and the server resources. I'm not saying that it can't be a potential problem, but the benefit outweighs the risk of it being one - I choose a reactive stance in this situation.
Closely monitor your Google Webmaster Tools account, watch the crawl rate and statistics, and if it becomes an issue then decide on which image folders should or shouldn't be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
"Avoid Too Many Internal Links" when you have a mega menu
Using the on-page grader and whilst further investigating internal linking, I'm concerned that as the ecommerce website has a very link heavy mega menu the rule of 100 may be impeding on the contextual links we're creating. Clearly we don't want to no-follow our entire menu. Should we consider no-indexing the third-level- for example short sleeve shirts here... Clothing > Shirts > Short Sleeve Shirts What about other pages we're don't care to index anyway such as the 'login page' the 'cart' the search button? Any thoughts appreciated.
Intermediate & Advanced SEO | | Ant-Scarborough0 -
Does anyone know how to fix this structured data error on search console? Invalid value in field "itemtype"
I'm getting the same structured data error on search console form most of my websites, Invalid value in field "itemtype" I take off all the structured data but still having this problem, according to Search console is a syntax problem but I can't find what is causing this. Any guess, suggestion or solution for this?
Intermediate & Advanced SEO | | Alexanders0 -
SEO on Jobs sites: how to deal with expired listings with "Google for Jobs" around
Dear community, When dealing with expired job offers on jobs sites from a SEO perspective, most practitioners recommend to implement 301 redirects to category pages in order to keep the positive ranking signals of incoming links. Is it necessary to rethink this recommendation with "Google for Jobs" is around? Google's recommendations on how to handle expired job postings does not include 301 redirects. "To remove a job posting that is no longer available: Remove the job posting from your sitemap. Do one of the following: Note: Do NOT just add a message to the page indicating that the job has expired without also doing one of the following actions to remove the job posting from your sitemap. Remove the JobPosting markup from the page. Remove the page entirely (so that requesting it returns a 404 status code). Add a noindex meta tag to the page." Will implementing 301 redirects the chances to appear in "Google for Jobs"? What do you think?
Intermediate & Advanced SEO | | grnjbs07175 -
My "search visibility" went from 3% to 0% and I don't know why.
My search visibility on here went from 3.5% to 3.7% to 0% to 0.03% and now 0.05% in a matter of 1 month and I do not know why. I make changes every week to see if I can get higher on google results. I do well with one website which is for a medical office that has been open for years. This new one where the office has only been open a few months I am having trouble. We aren't getting calls like I am hoping we would. In fact the only one we did receive I believe is because we were closest to him in proximity on google maps. I am also having some trouble with the "Links" aspect of SEO. Everywhere I see to get linked it seems you have to pay. We are a medical office we aren't selling products so not many Blogs would want to talk about us. Any help that could assist me with getting a higher rank on google would be greatly appreciated. Also any help with getting the search visibility up would be great as well.
Intermediate & Advanced SEO | | benjaminleemd1 -
Are HTML Sitemaps Still Effective With "Noindex, Follow"?
A site we're working on has hundreds of thousands of inventory pages that are generally "orphaned" pages. To reach them, you need to do a lot of faceting on the search results page. They appear in our XML sitemaps as well, but I'd still consider these orphan pages. To assist with crawling and indexation, we'd like to create HTML sitemaps to link to these pages. Due to the nature (and categorization) of these products, this would mean we'll be creating thousands of individual HTML sitemap pages, which we're hesitant to put into the index. Would the sitemaps still be effective if we add a noindex, follow meta tag? Does this indicate lower quality content in some way, or will it make no difference in how search engines will handle the links therein?
Intermediate & Advanced SEO | | mothner0 -
Switching from Http to Https, but what about images and image link juice?
Hi Ya'll. I'm transitioning our http version website to https. Important question: Do images have to have 301 redirects? If so, how and where? Please send me a link or explain best practices. Best, Shawn
Intermediate & Advanced SEO | | Shawn1241 -
Would changing the file name of an image (not the alt attribute) have an effect of on seo / ranking of that image and thus the site?
Would changing the file name of image, not the alt attribute nor the image itself (so it would be exactly the same but just a name change) have any effect on : a) A sites seo ranking b) the individual images seo ranking (although i guess if b) would be true it would have an effect on a) although potentially small.) This is the sort of change i would be thinking of making :  changed to 
Intermediate & Advanced SEO | | Sam-P0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0