Googlebot Can't Access My Sites After I Repair My Robots File
-
Hello Mozzers,
A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file'
My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows:
User-agent: *
Disallow: /cgi-bin/
After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage.
I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period.
Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'?
Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
-
Oleg gave a great answer.
Still I would add 2 things here:
1. Go to GWMT and under "Health" do a "Fetch as Googlebot" test.
This will tell you what pages are reachable.2. I`ve saw some occasions of server-level Googlebot blockage.
If your robots.txt is fine and your page contains no "no-index" tags, and yet you still getting an error message while fetching, you should get a hold on your access logs and check it for Googlebot user-agents to see if (and when) you were last visited.This will help you pin-point the issue, when talking to your hosting provider (or 3rd party security vendor).
If unsure, you can find Googlebot information (user agent and IPs ) at Botopedia.org.
-
A great answer
-
Maybe the spacing is off when you posted it here, but blank lines can affect robots.txt files. Try code:
User-agent: *
Disallow: /cgi-bin/
#End Robots#Also, check for robot blocking meta tags on the individual pages.
You can test to see if Google can access specific pages through GWT > Health > Blocked URLs (should see your robots.txt file contents int he top text area, enter the urls to test in the 2nd text area, then press "Test" at the bottom - test results will appear at the bottom of the page)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should you 'noindex' Checkout Pages?
Today I was reviewing my Moz analytics and suddenly noticed 1,000 issues with pages without a meta description. I reviewed the list and learned it is 1,000 checkout pages. That's because my website has thousands of agency pages from which you can buy a product, and it reflects that difference on each version of the checkout. So, I was thinking about no-indexing (but continuing to 'follow') these checkout pages, but wondering if it has any knock-on effects I may be unaware of? Any assistance is much appreciated. Luke
Intermediate & Advanced SEO | | Luke_Proctor0 -
What's the best way of crawling my entire site to get a list of NoFollow links?
Hi all, hope somebody can help. I want to crawl my site to export an audit showing: All nofollow links (what links, from which pages) All external links broken down by follow/nofollow. I had thought Moz would do it, but that's not in Crawl info. So I thought Screaming Frog would do it, but unless I'm not looking in the right place, that only seems to provide this information if you manually click down each link and view "Inlinks" details. Surely this must be easy?! Hope someone can nudge me in the right direction... Thanks....
Intermediate & Advanced SEO | | rl_uk0 -
Open Site Explorer - Top Pages that don't exist / result of a hack(?)
Hi all, Last year, a website I monitor, got hacked, or infected with malware, I’m not sure which. The result that I got to see is 100’s of ‘not found’ entries in Google Search Console / Crawl Errors for non-existent pages relating to / variations of ‘Canada Goose’. And also, there's a couple of such links showing up in SERPs. Here’s an example of the page URLs: ourdomain.com/canadagoose.php ourdomain.com/replicacanadagoose.php I looked for advice on the webmaster forums, and was recommended to just keep marking them as ‘fixed’ in the console. Sooner or later they’ll disappear. Still, a year after, they appear. I’ve just signed up for a Moz trail and, in Open Site Explorer->Top Pages, the top 2-5 pages are relating to these non-existent pages: URLs that are the result of this ‘canada goose’ spam attack. The non-existent pages each have around 10 Linking Root Domains, with around 50 Inbound Links. My question is: Is there a more direct action I should take here? For example, informing Google of the offending domains with these backlinks. Any thoughts appreciated! Many thanks
Intermediate & Advanced SEO | | macthing1 -
The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?
Hi We have an issue with images on our site not being found or indexed by Google. We have an image sitemap but the images are served on the Sitecore powered site within <divs>which Google can't read. The developers have suggested the below solution:</divs> Googlebot class="header-banner__image" _src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx"/>_Non Googlebot <noscript class="noscript-image"><br /></span></em><em><span><div role="img"<br /></span></em><em><span>aria-label="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>title="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>class="header-banner__image"<br /></span></em><em><span>style="background-image: url('/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx?mw=1024&hash=D65B0DE9B311166B0FB767201DAADA9A4ADA4AC4');"></div><br /></span></em><em><span></noscript> aria-label="Arctic Safari Camp, Arctic Canada" title="Arctic Safari Camp, Arctic Canada" class="header-banner__image image" data-src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx" data-max-width="1919" data-viewport="0.80" data-aspect="1.78" data-aspect-target="1.00" > Is this something that could be flagged as potential cloaking though, as we are effectively then showing code looking just for the user agent Googlebot?The devs have said that via their contacts Google has advised them that the original way we set up the site is the most efficient and considered way for the end user. However they have acknowledged the Googlebot software is not sophisticated enough to recognise this. Is the above solution the most suitable?Many thanksKate
Intermediate & Advanced SEO | | KateWaite0 -
Community Discussion - What's the ROI of "pruning" content from your ecommerce site?
Happy Friday, everyone! 🙂 This week's Community Discussion comes from Monday's blog post by Everett Sizemore. Everett suggests that pruning underperforming product pages and other content from your ecommerce site can provide the greatest ROI a larger site can get in 2016. Do you agree or disagree? While the "pruning" tactic here is suggested for ecommerce and for larger sites, do you think you could implement a similar protocol on your own site with positive results? What would you change? What would you test?
Intermediate & Advanced SEO | | MattRoney2 -
Can anyone see any issues with the canonical tags on this web site?
The main domain is: http://www.eumom.ie/ And these would be some of the core pages: http://www.eumom.ie/pregnancy/ http://www.eumom.ie/getting-pregnant/ Any help from the Moz community is much appreciated!
Intermediate & Advanced SEO | | IcanAgency0 -
Should I include www in url, or doesn't it matter?
Hello Mozzers, I was just wondering whether Google prefers www or non www URLs? Or doesn't it matter? Thanks in advance!
Intermediate & Advanced SEO | | McTaggart0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0