What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
-
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT.
I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap.
Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
-
No, but they might write to it, modify it, or do all sorts of other nasty stuff I've seen hackers do when they get a hold of any writeable file on a system.
-
lol it's a robots text file. what are they going to do. Steal it? I should have clarified do a 777 to make sure that is not your problem, then yes change the permission to be tighter
-
Eesh I don't recommend 777. 644 or, if you're going to change it right back, 755 at most.
-
File permission maybe? Change it to 777 and try it again
-
If you have shell access on Linux you can use wget or GET or run lynx.
If google is getting the wrong robots file then your web server must be sending out something other than what you think is the robots file.
What happens if you do this in your browser:
-
Looking in my log files, Google hits robots.txt just about every time it crawls our site.
What are you trying to accomplish using fetch as Googlebot? Any chance CURL could do the job for you, or another tool that ignores robots.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make it so that robots.txt is not ignored due to a URL re-direct?
Recently a site moved from blog.site.com to site.com/blog with an instruction like this one: /etc/httpd/conf.d/site_com.conf:94: ProxyPass /blog http://blog.site.com
Technical SEO | | rodelmo4
/etc/httpd/conf.d/site_com.conf:95: ProxyPassReverse /blog http://blog.site.com It's a Wordpress.org blog that was set as a subdomain, and now is being redirected to look like a directory. That said, the robots.txt file seems to be ignored by Google bot. There is a Disallow: /tag/ on that file to avoid "duplicate content" on the site. I have tried this before with other Wordpress subdomains and works like a charm, except for this time, in which the blog is rendered as a subdirectory. Any ideas why? Thanks!0 -
Clarification regarding robots.txt protocol
Hi,
Technical SEO | | nlogix
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice.0 -
Webmaster tools question
Hi all. I have a question regarding http vs https. I have an https site and was wondering how to tell google in Webmaster tools to combine and use https. I have setup all sites in Webmaster tools. Both www and non www for both http and https. I see where to set up the www vs the non www but don't quite understand how to do the https part. I want all traffic to: https://www-creative -technology-solutions.com Thanks
Technical SEO | | twoacejr0 -
"Links to your site" in google webmaster tools not showing any data
Hello All I have a very strange query regarding the "Links to your site" section in webmaster's account my account does not show the Link data after so many days (more then 30 days) of verification. Can you please help me out how can I get my data in the webmaster's account?
Technical SEO | | barnesdorf
Please note I have verified the account using Google Analytic verification process. (does this affect?) I have seen this issue in my two websites which I have verified by Google Analytics. Please help me out.0 -
When do you use 'Fetch as a Google'' on Google Webmaster?
Hi, I was wondering when and how often do you use 'Fetch as a Google'' on Google Webmaster and do you submit individual pages or main URL only? I've googled it but i got confused more. I appreciate if you could help. Thanks
Technical SEO | | Rubix1 -
Unused url 'A' contains frameset - can it damage the other site B?
Client has an old unused site 'A' which I've discovered during my backlink research. It contains this source code below which frames the client's 'proper' site B inside the old unused url A in the browser address. Quick question - will google penalise the website B which is the one I'm optimising? Should the client be using a redirect instead? <frameset <span class="webkit-html-attribute-name">border='0' frameborder='0' framespacing='0'></frameset <span> <frame src="http: www.clientwebsite.co.ukb" frameborder="0" noresize="noresize" scrolling="yes"></frame src="http:> Please go to http://www.clientwebsite.co.ukB <noframes></noframes> Thanks, Lu.
Technical SEO | | Webrevolve0 -
What if my host doesn't have the 301 redirect feature?
Ok, So i need to do a 301 redirect but my host doesn't have the feature with htaccess. I currently use yahoo. What are my options?
Technical SEO | | bronxpad0 -
Why this page doesn't get indexed?
Hi, I've just taken over development and SEO for a site and we're having difficulty getting some key pages indexed on our site. They are two clicks away from the homepage, but still not getting indexed. They are recently created pages, with unique content on. The architecture looks like this:Homepage >> Car page >> Engine specific pageWhenever we add a new car, we link to its 'Car page' and it gets indexed very quickly. However the 'Engine pages' for that car don't get indexed, even after a couple of weeks. An example of one of these index pages are - http://www.carbuzz.co.uk/car-reviews/Volkswagen/Beetle-New/2.0-TSISo, things we've checked - 1. Yes, it's not blocked by robots.txt2. Yes, it's in the sitemap (http://www.carbuzz.co.uk/sitemap.xml)3. Yes, it's viewable to search spiders (e.g. the link is present in the html source)This page doesn't have a huge amount of unique content. We're a review aggregator, but it still does have some. Any suggestions as to why it isn't indexed?Thanks, David
Technical SEO | | soulnafein0