Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
-
Hi!
I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us.
The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx
I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this:
User-agent: googlebot
Disallow: /community/photos/
Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible?
Thanks!
Leona
-
Are you seeing the images getting indexed, though? Even if GWT recognize the Robots.txt directives, blocking the pages may essentially keep the images from having any ranking value. Like Matt, I'm not sure this will work in practice.
Another option would be to create an alternate path to just the images, like an HTML sitemap with just links to those images and decent anchor text. The ranking power still wouldn't be great (you'd have a lot of links on this page, most likely), but it would at least kick the crawlers a bit.
-
Thanks Matt for your time and assistance! Leona
-
Hi Leona - what you have done is something along the lines of what I thought would work for you - sorry if I wasn't clear in my original response - I thought you meant if you created a robots.txt and specified Googlebot to be disallowed then Googlebot-image would pick up the photos still and as I said this wouldn't be the case as it Googlebot-image will follow what it set out for Googlebot unless you specify otherwise using the allow directive as I mentioned. Glad it has worked for you - keep us posted on your results.
-
Hi Matt,
Thanks for your feedback!
It is not my belief that Googlebot overwrides googlebot-images otherwise specifying something for a specific bot of Google's wouldn't work, correct?
I setup the following:
User-agent: googlebot
Disallow: /community/photos/
User-agent: googlebot-Image
Allow: /community/photos/
I tested the results in Google Webmaster Tools which indicated:
Googlebot: Blocked by line 26: Disallow: /community/photos/Detected as a directory; specific files may have different restrictions
Googlebot-Image: Allowed by line 29: Allow: /community/photos/Detected as a directory; specific files may have different restrictions
Thanks for your help!
Leona
-
Hi Leona
Googlebot-image and any of the other bots that Google uses follow the rules set out for Googlebot so blocking Googlebot would block your images as it overrides Googlebot-image. I don't think that there is a way around this using the disallow directive as you are blocking the directory which contains your images so they won't be indexed using specific images.
Something you may want to consider is the Allow directive -
Disallow: /community/photos/
Allow: /community/photos/~username~/
that is if Google is already indexing images under the username path?
The allow directive will only be successful if it contains more or equal number of characters as the disallow path, so bare in mind that if you had the following;
Disallow: /community/photos/
Allow: /community/photos/
the allow will win out and nothing will be blocked. please note that i haven't actioned the allow directive myself but looked into it in depth when i studied the robots.txt for my own sites it would be good if someone else had an experience of this directive. Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google image search
How does google decide which image show up in the image search section ? Is is based on the alt tag of the image or is google able to detect what is image is about using neural nets ? If it is using neural nets are the images you put on your website taken into account to rank a page ? Let's say I do walking tours in Italy and put a picture of the leaning tower of pisa as a top image while I be penalised because even though the picture is in italy, you don't see anyone walking ? Thank you,
Intermediate & Advanced SEO | | seoanalytics1 -
Blocking Dynamic Search Result Pages From Google
Hi Mozzerds, I have a quick question that probably won't have just one solution. Most of the pages that Moz crawled for duplicate content we're dynamic search result pages on my site. Could this be a simple fix of just blocking these pages from Google altogether? Or would Moz just crawl these pages as critical crawl errors instead of content errors? Ultimately, I contemplated whether or not I wanted to rank for these pages but I don't think it's worth it considering I have multiple product pages that rank well. I think in my case, the best is probably to leave out these search pages since they have more of a negative impact on my site resulting in more content errors than I would like. So would blocking these pages from the Search Engines and Moz be a good idea? Maybe a second opinion would help: what do you think I should do? Is there another way to go about this and would blocking these pages do anything to reduce the number of content errors on my site? I appreciate any feedback! Thanks! Andrew
Intermediate & Advanced SEO | | drewstorys0 -
What is the impact of an off-topic page to other pages on the site?
We are working with a client who has one irrelevant, off-topic post ranking incredibly well and driving a lot of traffic. However, none of the other pages on the site, that are relevant to this client's business, are ranking. Links are good and in-line with competitors for the various terms. Oddly, very few external links reference this off-topic post, most are to the home page. Local profile is also in-line with competitors, including reviews, categorization, geo-targeting, pictures, etc. No spam issues exist and no warnings in Google Search Console. The only thing that seems weird is this off-topic post but that could affect rankings on other pages of the site? Would removing that off-topic post potentially help increase traffic and rankings for the other more relevant pages of the site? Appreciate any and all help or ideas of where to go from here. Thanks!
Intermediate & Advanced SEO | | Matthew_Edgar0 -
Can Googlebots read canonical tags on pages with javascript redirects?
Hi Moz! We have old locations pages that we can't redirect to the new ones because they have AJAX. To preserve pagerank, we are putting canonical tags on the old location pages. Will Googlebots still read these canonical tags if the pages have a javascript redirect? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
"Authorship is not working for this webpage" Can a company G+ page be both Publisher AND Author?
When using the Google Structured Data testing tool I get a message saying....... **Authorship Testing Result - **Authorship is not working for this webpage. Here are the results of the data for the page http://www.webjobz.com/jobs/ Authorship Email Verification Please enter a Google+ profile to see if the author has successfully verified an email address on the domain www.webjobz.com to establish authorship for this webpage. Learn more <form id="email-verification-form" action="http://www.google.com/webmasters/tools/richsnippets" method="GET" data-ved="0CBMQrh8">Verify Authorship</form> Email verification has not established authorship for this webpage.Email address on the webjobz.com domain has been verified on this profile: YesPublic contributor-to link from Google+ profile to webjobz.com: YesAutomatically detected author name on webpage: Not Found.Publisher | Publisher markup is verified for this page. |
Intermediate & Advanced SEO | | Webjobz
| Linked Google+ page: | https://plus.google.com/106894524985345373271 | Question - Can this company Google plus account "Webjobz" be both the publisher AND the author? Can I use https://plus.google.com/106894524985345373271 as the author of this and all other pages on our site? 98emVv70 -
Link to image (jpg) - Do I benefit? If not how can I?
Doing some research I found a .edu page linking directly to an image on my site. I can't see how this really benefits me so am wondering how to point the link juice somewhere useful, like the page on which the image resides. Can this be done? One idea that just occured to me would be to rename the image and set up a 301 in the .htaccess. Would that work?
Intermediate & Advanced SEO | | Cornwall0 -
Zip Code Blocks the Search Engines!
I have a site where when you visit the product pages, it asks for your zip code. This is obviously blocking the bots from crawling the site. I know you can basically tell the bots how to ignore the zip code feature but I am not exactly sure how to do this. Any help would be appreciated
Intermediate & Advanced SEO | | lhawk0 -
Have completed keyword analysis and on page optimization. What else can I do to help improve SERP ranking besides adding authoritative links?
Looking for concrete ways to continue to improve SERP results. thanks
Intermediate & Advanced SEO | | casper4340