What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
-
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT.
I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap.
Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
-
No, but they might write to it, modify it, or do all sorts of other nasty stuff I've seen hackers do when they get a hold of any writeable file on a system.
-
lol it's a robots text file. what are they going to do. Steal it? I should have clarified do a 777 to make sure that is not your problem, then yes change the permission to be tighter
-
Eesh I don't recommend 777. 644 or, if you're going to change it right back, 755 at most.
-
File permission maybe? Change it to 777 and try it again
-
If you have shell access on Linux you can use wget or GET or run lynx.
If google is getting the wrong robots file then your web server must be sending out something other than what you think is the robots file.
What happens if you do this in your browser:
-
Looking in my log files, Google hits robots.txt just about every time it crawls our site.
What are you trying to accomplish using fetch as Googlebot? Any chance CURL could do the job for you, or another tool that ignores robots.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Falling rankings - can't figure out why
I am still fairly green on in depth SEO, however I have a good grasp on making a site SEO friendly, as my skills are more down to website construction than technical SEO, however, I am working on a site at the moment which just continues to lose rankings and is slipping further and further. Keywords are dropping week on week in rankings Search visibility is also dropping week on week On site sales have fallen massively in the last quarter We have made huge improvements on the following; Moved the site to a faster stand alone cloud vps server - taken page rank scores from 54 to 87%. Added caching (WP Rocket) & CDN support. Improved URL structure (Woocommerce) removed /product and or /product-category from URLS to give more accurate & relevant structures. Added canonical URLs to all product categories (We use Yoast Premium) Amended on page structures to include correct H tags. Improved Facebook activity with a huge increase in engagements These are just some of the improvements we have made, yet we're still seeing huge drops in traffic and rankings. One insight I have noted which may be a big pointer, is we have 56 backlinks.... which I know is not good and we are about to address this. I suspect this is the reason for the poor performance, but should I be looking at anything else? Is there anything else we should be looking at? As I said, I'm no SEO specialist, but I don't think there's been any Penguin penalty, but my expertise is not sufficient enough to dig deeper. Can anyone offer any constructive advice at this stage? I'm thinking things to look at that could be hurting us that isn't immediately obvious? The site is www.glassesonspec.co.uk Thanks in advance Bob
Technical SEO | | SushiUK0 -
Google Webmaster Tools - content keywords containing spam?
Hi all, When I looked in Google Webmaster Tools today I found under the menu Google Index, Content Keywords, that the list is full of spammy keywords (E.g. Viagra (no. 1) and stuff like that) Around april we built a whole new website, uploaded a new xml-sitemap, and did all the other things Google Webmaster Tools suggest when one is creating a Google Webmaster Account. Under the menu "Security Issues" nothing is mentioned. All together I find it har d to believe that the site is hacked - so WHY is Google finding these content keywords on our site?? Should I fear that this will harm my SEO efforts? Best regards, Christian
Technical SEO | | Henrik_Kruse0 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
Why Doesn't All Structured Data Show in Google Webmaster?
We have more than 80k products, each of them with data-vocabulary.org markup on them, but only 17k are being reported as having the markup in Google Webmaster (GW). If I run a page that GW isn't showing as having the structure data in the structured data testing tool (http://www.google.com/webmasters/tools/richsnippets), it passes. Any thoughts on why this would be happening? Is it because we should switch from data-vocabulary.org to schema.org? Example of page that GW is reporting that has structured data: https://www.etundra.com/restaurant-equipment/refrigeration/display-cases/coutnertop/vollrath-40862-36-inch-cubed-glass-refrigerated-display-cabinet/ Example of page that isn't showing in GW as having structured data: https://www.etundra.com/kitchen-supplies/cutlery/sandwich-spreaders/mundial-w5688-4-and-half-4-and-half-sandwich-spreader/
Technical SEO | | eTundra0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Where did the 'Contributor To' area go in Google+
I went into my Google+ profile this morning to try to add a new guest blog in the 'Contributor To' section but I can't find it. Did they move it somewhere?
Technical SEO | | JonathanGoodman0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0