Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dilemma about "images" folder in robots.txt
-
Hi, Hope you're doing well.
I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
-
Yup my images send me traffic from Google images on most of my sites and attractive images attract hotlinks as well. At the moment people are hosting their images on a different domain (cdn) and are still being credited with the images but I haven't tried to do that myself ie I don't know if they've set some "ownership" somewhere and somehow.
-
I recommend allowing Google to crawl those images. Google optimizes its crawl rate and once it has done a complete crawl it will understand how often to crawl certain areas of your site. My main concern would be that you are losing potential rankings and indexing from those images - if they are unique and high quality you definitely want them to index the images, understand the file names, and appropriately index them.
I wouldn't be concerned about Google bot eating up your server resources. If it does become a problem, then you can go back and adjust the bot access through the robots.txt, as you've done already. However, I would let them in first and only react if it becomes a problem.
I have tens of thousands of product images accessed by the google bot and it is no concern to my ecommerce company and the server resources. I'm not saying that it can't be a potential problem, but the benefit outweighs the risk of it being one - I choose a reactive stance in this situation.
Closely monitor your Google Webmaster Tools account, watch the crawl rate and statistics, and if it becomes an issue then decide on which image folders should or shouldn't be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does redirecting from a "bad" domain "infect" the new domain?
Hi all, So a complicated question that requires a little background. I bought unseenjapan.com to serve as a legitimate news site about a year ago. Social media and content growth has been good. Unfortunately, one thing I didn't realize when I bought this domain was that it used to be a porn site. I've managed to muck out some of the damage already - primarily, I got major vendors like Macafee and OpenDNS to remove the "porn" categorization, which has unblocked the site at most schools & locations w/ public wifi. The sticky bit, however, is Google. Google has the domain filtered under SafeSearch, which means we're losing - and will continue to lose - a ton of organic traffic. I'm trying to figure out how to deal with this, and appeal the decision. Unfortunately, Google's Reconsideration Request form currently doesn't work unless your site has an existing manual action against it (mine does not). I've also heard such requests, even if I did figure out how to make them, often just get ignored for months on end. Now, I have a back up plan. I've registered unseen-japan.com, and I could just move my domain over to the new domain if I can't get this issue resolved. It would allow me to be on a domain with a clean history while not having to change my brand. But if I do that, and I set up 301 redirects from the former domain, will it simply cause the new domain to be perceived as an "adult" domain by Google? I.e., will the former URL's bad reputation carry over to the new one? I haven't made a decision one way or the other yet, so any insights are appreciated.
Intermediate & Advanced SEO | | gaiaslastlaugh0 -
Can Google read content that is hidden under a "Read More" area?
For example, when a person first lands on a given page, they see a collapsed paragraph but if they want to gather more information they press the "read more" and it expands to reveal the full paragraph. Does Google crawl the full paragraph or just the shortened version? In the same vein, what if you have a text box that contains three different tabs. For example, you're selling a product that has a text box with overview, instructions & ingredients tabs all housed under the same URL. Does Google crawl all three tabs? Thanks for your insight!
Intermediate & Advanced SEO | | jlo76130 -
Is Chamber of Commerce membership a "paid" link, breaking Google's rules?
Hi guys, This drives me nuts. I hear all the time that any time value is exchanged for a link that it technically violates Google's guidelines. What about real organizations, chambers of commerce, trade groups, etc. that you are a part of that have online directories with DO-follow links. On one hand people will say these are great links with real value outside of search and great for local SEO..and on the other hand some hardliners are saying that these technically should be no-follow. Thoughts???
Intermediate & Advanced SEO | | RickyShockley0 -
Using hreflang="en" instead of hreflang="en-gb"
Hello, I have a question in regard to international SEO and the hreflang meta tag. We are currently a B2B business in the UK. Our major market is England with some exceptions of sales internationally. We are wanting to increase our ranking into other english speaking countries and regions such as Ireland and the Channel Islands. My research has found regional google search engines for Ireland (google.ie), Jersey (google.je) and Guernsey (google.gg). Now, all the regions have English as one their main language and here is my questions. Because I use hreflang=“en-gb” as my site language, am I regional excluding these countries and islands? If I used hreflang=“en” would it include these english speaking regions and possible increase the ranking on these the regional search engines? Thank you,
Intermediate & Advanced SEO | | SilverStar11 -
"sex" in non-adult domain name
I have a client with a domain that has "sex" in the domain name. For example, electronicsexpo.com. The domain ranks for a few keywords related to the services offered. It is an old domain that has been online for over 10 years. It ranks well for local keywords. No real SEO effort has been made on this domain, so it is rather a clean slate. I am going to be doing SEO on this site. Will the fact that the word "sex" exists in the name have any sort of negative consequence. There is ABSOLUTELY NOTHING adult related or pornographic on this site. I would think that search engines are sophisticated enough to differentiate, but would potential customers with things like parental filters be blocked from viewing content? Is this hurtful in anyway? If so, would I be better off changing domain names? TIA
Intermediate & Advanced SEO | | inhouseseo0 -
Putting "noindex" on a page that's in an iframe... what will that mean for the parent page?
If I've got a page that is being called in an iframe, on my homepage, and I don't want that called page to be indexed.... so I put a noindex tag on the called page (but not on the homepage) what might that mean for the homepage? Nothing? Will Google, Bing, Yahoo, or anyone else, potentially see that as a noindex tag on my homepage?
Intermediate & Advanced SEO | | Philip-DiPatrizio0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
Rel="canonical" and rel="alternate" both necessary?
We are fighting some duplicate content issues across multiple domains. We have a few magento stores that have different country codes. For example: domain.com and domain.ca, domain.com is the "main" domain. We have set up different rel="alternative codes like: The question is, do we need to add custom rel="canonical" tags to domain.ca that points to domain.com? For example for domain.ca/product.html to point to: Also how far does rel="canonical" follow? For example if we have:
Intermediate & Advanced SEO | | AlliedComputer
domain.ca/sub/product.html canonical to domain.com/sub/product.html
then,
domain.com/sub/product.html canonical to domain.com/product.html0