Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
Google+ Contibutor to: Link To Main Domain or Content Page?
Which is the best practice for the link to claim authorship for a guest post? I have tried both the main domain URL in the "contributor to" section of my Google plus and the page URL where the post is and both show my picture when testing in the Structured Data Testing Tool. Which is best to use? Thanks in advance.
Technical SEO | | WSIDW0 -
Does this content get indexed?
A lot of content on this site is displayed in pop up pages. Eg. Visit the Title page http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title To access the sample report or fee details, the info is shown in a pop up page with a strange url. Example: http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title+-+Fee+Details I can't see any of these pages being indexed in Google or other search engines when I do a site search: http://www.landgate.wa.gov.au/corporate.nsf/web/Certificate+of+Title+-+Fee+Details Is there a way to get this content indexed besides telling the client to restructure this content?
Technical SEO | | Bigheadigital0 -
How do you know what version of your site of Google is in their index?
This is going to sound like a strange question, but I am trying to understand which version of our site is in the index. You might think this is an obvious question, but here is why I am asking: 1. Today I searched for a specific keyword and found the listing. 2. I liked on the right arrow next to the listing and checked the cache date. It says 6/28 and shows the site as of 6/28. 3. I expected to see that we were just indexed as we jumped several pages since yesterday and I had just checked two days ago and we hadn't moved at all. It seems like Google may have taken the changes we made on 7/2 but since it is showing 6/28, I am note sure. Since this is confusing, here is the chronology: 1. Made changes 6/20. 2. Site appeared to be indexed on 6/28. 3. Made changes on 7/2. 4. Checked the site on 7/2 and we were in position 60. Checked the site on 7/4 and we were in position 61. 5.. Checked the site today (7/6) and see we are in position 8. The cache date shows as 6/28. I suspect that Google just indexed us yesterday and is reflecting the changes I made on 7/2. But the fact that it says it was cached on 6/28 seems to sugges otherwise. I want to be sure I know which version got us the good rankings - is there any way to be sure? Thanks!!
Technical SEO | | trophycentraltrophiesandawards0 -
Directory Indexed in Google, that I dont want, How to remove?
Hi One of my own websites, having a slight issue, Google have indexed over 500+ pages and files from a template directory from my eCommerce website. In google webmaster tools, getting over 580 crawl errors mostly these ones below I went into my robots text file and added Disallow: /skins*
Technical SEO | | rfksolutionsltd
Disallow: /skin1* Will this block Google from searching them again? and how do I go about getting the 500 pages that are already indexed taken out? Any help would be great | http://www.rfkprintsolutions.co.uk/skin1/modules/Subscriptions/subscription_priceincart.tpl | 403 error | Jan 15, 2012 |
|http://www.rfkprintsolutions.co.uk/skin1/modules/Subscriptions/subscription_info_inlist.tpl | 403 error | Jan 15, 2012 |
|http://www.rfkprintsolutions.co.uk/skin1/modules/Subscriptions/subscriptions_admin.tpl | 403 error | Jan 15, 2012 |
0 -
Robots.txt blocking site or not?
Here is the robots.txt from a client site. Am I reading this right --
Technical SEO | | 540SEO
that the robots.txt is saying to ignore the entire site, but the
#'s are saying to ignore the robots.txt command? See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file To ban all spiders from the entire site uncomment the next two lines: User-Agent: * Disallow: /0 -
Will Google index a 301 redirect for a new site?
So here is the problem... We have setup a 301redirect for our clients website. When you search the clients name it comes up with the old .co.uk website. We have made this redirect to the new .com website. However on the SERPs when it shows the .co.uk it shows the old title pages which currently say 'Holding Page'. When you click on that link it takes you to the fully functioning .com website. My question is, will the title tags in the SERPs which show the .co.uk update to the new ones from the .com? I'm thinking it will be just a case of Google catching up on things and it will sort itself out eventually. If anyone could help I would REALLY appreciate it. Thanks Chris
Technical SEO | | Weerdboil0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0