Googlebot Can't Access My Sites After I Repair My Robots File
-
Hello Mozzers,
A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file'
My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows:
User-agent: *
Disallow: /cgi-bin/
After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage.
I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period.
Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'?
Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
-
Oleg gave a great answer.
Still I would add 2 things here:
1. Go to GWMT and under "Health" do a "Fetch as Googlebot" test.
This will tell you what pages are reachable.2. I`ve saw some occasions of server-level Googlebot blockage.
If your robots.txt is fine and your page contains no "no-index" tags, and yet you still getting an error message while fetching, you should get a hold on your access logs and check it for Googlebot user-agents to see if (and when) you were last visited.This will help you pin-point the issue, when talking to your hosting provider (or 3rd party security vendor).
If unsure, you can find Googlebot information (user agent and IPs ) at Botopedia.org.
-
A great answer
-
Maybe the spacing is off when you posted it here, but blank lines can affect robots.txt files. Try code:
User-agent: *
Disallow: /cgi-bin/
#End Robots#Also, check for robot blocking meta tags on the individual pages.
You can test to see if Google can access specific pages through GWT > Health > Blocked URLs (should see your robots.txt file contents int he top text area, enter the urls to test in the 2nd text area, then press "Test" at the bottom - test results will appear at the bottom of the page)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I better noindex 'scripted' files in our portfolio?
Hello Moz community, As a means of a portfolio, we upload these PowerPoint exports – which are converted into HTML5 to maintain interactivity and animations. Works pretty nicely! We link to these exported files from our products pages. (We are a presentation design company, so they're pretty relevant). For example: https://www.bentopresentaties.nl/wp-content/portfolio/ecar/index.html However, they keep coming up in the Crawl warnings, as the exported HTML-file doesn't contain text (just code), so we get errors in: thin content no H1 missing meta description missing canonical tag I could manually add the last two, but the first warnings are just unsolvable. Therefore I figured we probably better noindex all these files… They appear to don't contain any searchable content and even then; the content of our clients work is not relevant for our search terms etc. They're mere examples, just in the form of HTML files. Am I missing something or should I better noindex these/such files? (And if so: is there a way to include a whole directory to noindex automatically, so I don't have to manually 'fix' all the HTML exports with a noindex tag in the future? I read that using disallow in robots.txt wouldn't work, as we will still link to these files as portfolio examples).
Intermediate & Advanced SEO | | BentoPres0 -
Help me to understand why this page doesn't rank
Hello everyone. I am trying to understand why most of my website category pages don't show up in the in the first 50 organic results on Google, despite my high website DA and high PA of those pages. We used to rank high a few years ago, not clear why most of those pages have almost completely disappeared. So, just to take one as an example, please, help me to understand why this page doesn't shows up in the first 50 organic search results for the keyword "cello sheet music": http://www.virtualsheetmusic.com/downloads/Indici/Cello.html I really can't explain why, unless we are under some sort of "penalization" or similar (a curse?!)... I have analyzed any possible metric, and can't find a logical explanation. Looking forward for your thoughts guys! All the best, Fab.
Intermediate & Advanced SEO | | fablau0 -
Why isn't Google indexing this site?
Hello, Moz Community My client's site hasn't been indexed by Google, although it was launched a couple of months ago. I've ran down the check points in this article https://moz.com/ugc/8-reasons-why-your-site-might-not-get-indexed without finding a reason why. Any sharp SEO-eyes out there who can spot this quickly? The url is: http://www.oldermann.no/ Thank you
Intermediate & Advanced SEO | | Inevo
INEVO, digital agency0 -
Baffled by this site's inability to rank
Hi guys, I've been working on a site for quite a while and it has a really good link profile, excellent content, no errors or penalties (as far as I can tell) but for some reason it consistently ranks below a lot of thin poor quality websites with spammy EMDs and a few obviously paid links from old-skool business directories etc. It has a significantly higher DA and linking root domains that almost all of them. Also it just bounces around from #40 to #28 to#35 to #40 to #28 on a weekly basis for many of our primary keywords. There just seems to be no logic to this and it goes against everything I know and everything we're taught. (I should probably point out that I've been doing this quite a while and have a number of other sites ranking extremely well in quite a few different verticals), Has anyone ever experienced anything like this and what did you do? Before I throw in the towel it would be good to hear from others and try and understand why this happens and if there is anything else I can try to help my client and fix it. Many thanks in advance.
Intermediate & Advanced SEO | | Blaze-Communication0 -
Why isn't my site being indexed by Google?
Our domain was originally pointing to a Squarespace site that went live in March. In June, the site was rebuilt in WordPress and is currently hosted with WPEngine. Oddly, the site is being indexed by Bing and Yahoo, but is not indexed at all in Google i.e. site:example.com yields nothing. As far as I know, the site has never been indexed by Google, neither before nor after the switch. What gives? A few things to note: I am not "discouraging search engines" in WordPress Robots.txt is fine - I'm not blocking anything that shouldn't be blocked A sitemap has been submitted via Google Webmaster Tools and I have "fetched as Google" and submitted for indexing - No errors I've entered both the www and non-www in WMT and chose a preferred There are several incoming links to the site, some from popular domains The content on the site is pretty standard and crawlable, including several blog posts I have linked up the account to a Google+ page
Intermediate & Advanced SEO | | jtollaMOT0 -
Robots.txt Blocked Most Site URLs Because of Canonical
Had a bit of a "Gotcha" in Magento. We had Yoast Canonical Links extension which worked well , but then we installed Mageworx SEO Suite.. which broke Canonical Links. Unfortunately it started putting www.mysite.com/catalog/product/view/id/516/ as the Canonical Link - and all URLs with /catalog/productview/* is blocked in Robots.txt So unfortunately We told Google that the correct page is also a blocked page. they haven't been removed as far as I can see but traffic has certainly dropped. We have also , at the same time had some Site changes grouping some pages & having 301 redirects. Resubmitted site map & did a fetch as google. Any other ideas? And Idea how long it will take to become unblocked?
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Can I dissavow links on a 301'd website?
So we are performing link removal for a client on his old website (A), which is being 301 redirected to his new website (B). We have identified toxic links on site A and are removing, once complete we will undo the current 301, confirm a new GWT account for website A, and then submit the disavow report. We would then like to reapply the 301 redirect to site B while we are waiting for Google to process the disavow report, the logic being we can retain some current rankings on site B while waiting for the disavow to process on site A. Has anyone had experience with this method? I foresee some potential issues here but am interested to here from others on this. Thanks!
Intermediate & Advanced SEO | | SEOdub1 -
Impact of simplifying website and removing 80% of site's content
We're thinking of simplifying our website which has grown to a very large size by removing all the content which hardly ever gets visited. The plan is to remove this content / make changes over time in small chunks so that we can monitor the impact on SEO. My gut feeling is that this is okay if we make sure to redirect old pages and make sure that the pages we remove aren't getting any traffic. From my research online it seems that more content is not necessarily a good thing if that content is ineffective and that simplifying a site can improve conversions and usability. Could I get people's thoughts on this please? Are there are risks that we should look out for or any alternatives to this approach? At the moment I'm struggling to combine the needs of SEO with making the website more effective.
Intermediate & Advanced SEO | | RG_SEO0