Impact of "restricted by robots" crawler error in WT
-
I have been wondering about this for a while now with regards to several of my sites. I am getting a list of pages that I have blocked in the robots.txt file. If I restrict Google from crawling them, then how can they consider their existence an error? In one case, I have even removed the urls from the index.
And do you have any idea of the negative impact associated with these errors.
And how do you suggest I remedy the situation.
Thanks for the help
-
Google is just showing you a warning that hey, these are excluded, make sure that you want them excluded. They're not passing a judgement on whether or not they should be excluded. So, as long as they're excluded on purposes, no worries.
-
Hi Patrick,
That section is simply there to advice on any URLs that Google feels are wrongly excluded within the robots.txt
If the URLs are not wrongly excluded, don't worry about it showing in WMT's - it's there just as an advisory.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Combining variants of "last modified", cache-duration etc
Hiya, As you know, you can specify the date of the last change of a document in various places, for example the sitemap, the http-header, ETag and also add an "expected" change, for example Cache-Duration via header/htaccess (or even the changefreq in the sitemap). Is it advisable or rather detrimental to use multiple variants that essentially tell browser/search engines the same thing? I.e. should I send a lastmod header AND ETag AND maybe something else? Should I send a cache duration at all if I send a lastmod? (Assume that I can keep them correct and consistent as the data for each will come from the very same place.) Also: Are there any clear recommendations on what change-indicating method should be used? Thanks for your answers! Nico
Technical SEO | | netzkern_AG0 -
Question about Robot.txt
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format. For example this URL: www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap) Another error is www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time. Can somebody help me on the format of Robot.txt. Please? thanks
Technical SEO | | paumer800 -
406 errors
Just started seeing 406 errors on our last crawl (all jpg related). Seomoz found 670 of these on my site when there were 0 before. I have checked the MIME and everything seems to be in the right order. So could it be that Seomoz-crawler is showing errors that aren't really errors?
Technical SEO | | smines0 -
Can name="author" register as a link?
Hi all, We're seeing a very strange result in Google Webmaster tools. In "Links to your site", there is a site which we had nothing to do with (i.e. we didn't design or build it) showing over 1600 links to our site! I've checked the site several times now, and the only reference to us is in the rel="author" tag. Clearly the agency that did their design / SEO have nicked our meta, forgetting to delete or change the author tag!! There are literally no other references to us on this site, there hasn't every been (to our knowledge, at least) and so I'm very puzzled as to why Google thinks there are 1600+ links pointing to us. The only thing I can think of is that Google will recognise name="author" content as a link... seems strange, though. Plus the content="" only contains our company name, not our URL. Can anybody shed any light on this for me? Thanks guys!
Technical SEO | | RiceMedia0 -
Does google "see through" php/asp redirects?
A lot of the time I see companies employing a technique like this: <a target="_blank" href="/external/wcpages/referral.aspx?URL=http%253a%252f%252fwww.xxxx.ca&ReferralType=W&ProfileID=22&ListingID=96&CategoryID=219">xxxxxa> Or similarly with php. In an attempt to log all the clicks that exit their site from certain locations. When google bot comes along and crawls this page, does it still understand that this page links to www.xxxx.ca?
Technical SEO | | adriandg0 -
Google Webmaster tools error?
So I am trying to set the URL preference in google webmaster tools for my site. However when I try to save it it tells me to verify that I own the site. I have already done this so where can I go to verify I own the site exactly? Maybe I am wrong and I have not done this already but even on the homepage of webmaster tools I don't see an option to "verify".
Technical SEO | | ENSO0 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0