Indexed, though blocked by robots.txt: Need to bother?
-
Hi,
We have intentionally blocked some of the website files which were indexed for years. Now we receive a message "Indexed, though blocked by robots.txt" in GSC. We can ignore as per my knowledge? Are any actions required about this? We thought of blocking them with meta tags but these are PDF files.
Thanks
-
Hi there!
What Google is telling you is that you are indexing URLs that you probably are not wanting to be indexed, or the other way around, that important pages are being blocked but indexed for other reasons.
If I might ask, why did you blocked through robots.txt those files?
There most 2 answers are:
1- Wanted to remove those from search results. If this is your case, you've solved only a part of the problem. What you should have done is (previously allowing robots to crawl those urls) apply noindex rules (keep in mind that can be set up in the HTTP header, as long as not html files cant have meta robots tag), then after a sufficient time block them in robots.txt.
_2- Optimize how GoogleBot (crawiling) time. _Being this case, then you've done it correctly and there is nothing to worry.Hope this help.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website have Caching/Indexing / Ranking Issue
Hi, My Website (https://www.v3cars.com) is not cached or indexed on regular basic from last 15 days. before this it was cached or indexed on regular basic. We are uploading fresh content on daily basic. Currently my new content is not ranked anywhere in Google even after cached or indexed. Please help and suggest. Sandeep - Love to Cars
Algorithm Updates | | onlinesandeep0 -
Duplicate website pages indexed: Ranking dropped. Does Google checks the duplicate domain association?
Hi all, Our duplicate website which is used for testing new optimisations got indexed and we dropped in rankings. But I am not sure whether this is exact reason as it happened earlier too where I don't find much drop in rankings. Also I got replies in the past that it'll not really impact original website but duplicate website. I think this rule applies to the third party websites. But if our own domain has exact duplicate content; will Google knows that we own the website from any other way we are associated like IP addresses and servers, etc..to find the duplicate website is hosted by us? I wonder how Google treats duplicate content from third party domains and own domains. Thanks
Algorithm Updates | | vtmoz0 -
Indexed Pages Increase and Major Drop June 25th and July 16th?
I am seeing information regarding a possible Google algorithm that may have taken place on June 25th...and seeing total number of pages indexed in GSC increase (cool!)...BUT, then on July 16, I'm seeing a consistent drop (BIG DROP) of pages indexed - not only on our site, but several. Does anyone have any insight into this or experiencing the same issue?
Algorithm Updates | | kwilgus0 -
Do pages with canonicals need meta data?
Page A has a canonical to Page B. Should Page A have meta data values such as description, keywords, dublin core values, etc.? If yes, should the meta data values be different on Page A and Page B?
Algorithm Updates | | Shirley.Fenlason1 -
What is the appropriate Robot.txt to unblock if Google cannot get all the resources from my homepage?
Hello everyone. I did some research as to why my website has decreased in the Google search rankings recently. After reading this Yoast article I believe it's because the robot.txt files I have set up on my wordpress website. The following is a screen shot of the results of a fetch & render query of my webpage.Googlebot couldn't get all resources for this page. Here's a list: URL Type Reason http://fonts.googleapis.com/css?family=Open+Sans:400,600,700,800%7CPT+Sans:400,400italic,700,700italic%7COswald:400,300,700&subset=latin,latin-ext Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/slick-contact-forms/css/admin.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/contact-form-plugin/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/hupso-share-buttons-for-twitter-facebook-google/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/lpaccordion.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/dynamic-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/static-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/wp-email-capture/inc/css/wp-email-capture-styles.css?ver=1.0 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/stylesheet.min.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/style_dynamic.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/custom_css.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/convertable-contact-form-builder-analytics-and-lead-management-dashboard/assets/css/convertable.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/google-maps-widget/css/gmw.css?ver=1.66 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/acurax-social-media-widget/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-includes/js/swfobject.js?ver=2.2-20120417 Script Denied by robots.txt My current robot.txt settings are as follows; User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: */xmlrpc.php Disallow: */wp-*.php Disallow: */trackback/ Disallow: *?wptheme= Disallow: *?comments= Disallow: *?replytocom Disallow: */comment-page- Disallow: *?s= Disallow: */wp-content/ Allow: */wp-content/uploads/ ```What to I need to allow/disallow to allow Google spiders to properly read my website?
Algorithm Updates | | gamesotd0 -
Homepage Index vs Home vs Default?
Should your home page be www.yoursite.com/index.htm or home.htm or default.htm on an apache server? Someone asked me this, and I have no idea. On our wordpress site, I have never even seen this come up, but according to my friend, every homepage HAS to be one of those three. So my question is which one is best for an apache server site AND does it actually have to be one of those three? Thanks, Ruben
Algorithm Updates | | KempRugeLawGroup0 -
Trying to figure out why one of my popular pages was de-indexed from Google.
I wanted to share this with everyone for two reasons. 1. To try to figure out why this happened, and 2 Let everyone be aware of this so you can check some of your pages if needed. Someone on Facebook asked me a question that I knew I had answered in this post. I couldn't remember what the url was, so I googled some of the terms I knew was in the page, and the page didn't show up. I did some more searches and found out that the entire page was missing from Google. This page has a good number of shares, comments, Facebook likes, etc (ie: social signals) and there is certainly no black / gray hat techniques being used on my site. This page received a decent amount of organic traffic as well. I'm not sure when the page was de-indexed, and wouldn't have even known if I had't tried to search for it via google; which makes me concerned that perhaps other pages are being de-indexed. It also concerns me that I have done something wrong (without knowing) and perhaps other pages on my site are going to be penalized as well. Does anyone have any idea why this page would be de-indexed? It sure seems like all the signals are there to show Google this page is unique and valuable. Interested to hear some of your thoughts on this. Thanks
Algorithm Updates | | NoahsDad0 -
Panda Update: Need your expertise...
Hi all, After Panda update our website lost about 45% of it's traffic from Google. It wasn't an instant drop mostly it happened gradually over the last 5 months. Our keywords (all of them except the domain name) started to lose positions from top #10 to now 40+ and all recovery attempts we have done so far didn't really help. At this moment it would be great to get some advice from the top experts like you here. What we have done so far is that We have gone through the all pages and removed the duplicate / redundant ones. We have refresh the content on the main pages and also all pages now have an canonical tags. Our website is www.PrintCountry.com. Thank you very much in advance for your time.
Algorithm Updates | | gbssinc0