Removing a site from Google's index
-
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
-
ok. Not abundantly clear upon first reading. Thank you for your help.
-
Thank you for pointing that out Arlene. I do see it now.
The statement before that line is of key importance for an accurate quote. "If you own the site, you can verify your ownership in Webmaster Tools and use the verified URL removal tool to remove an entire directory from Google's search results."
It could be worded better but what they are saying is AFTER your site has already been removed from Google's index via the URL removal tool THEN you can block it with robots.txt. The URL removal tool will remove the pages and keep them out of the index for 90 days. That's when changing the robots.txt file can help.
-
"Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site)."
The above is a quote from the page. You have to expand the section I referenced in my last comment. Just re-posting google's own words.
-
I thought you were offering a quote from the page. It seems that is your summarization. I apologize for my misunderstanding.
I can see how you can make that conclusion but it not accurate. Robots.txt does not ensure a page wont get indexed. I always recommend use of the noindex tag which should be 100% effective for the major search engines.
-
Go here: http://www.google.com/support/webmasters/bin/answer.py?answer=164734
Then expand the option down below that says: "<a class="zippy zippy-track zippy-collapse" name="RemoveDirectory">I want to remove an entire site or the contents of a directory from search results"</a>
They basically instruct you to block all robots in the robots.txt file, then request removal of your site. Once it's removed, the robots file will keep it from getting back into the index. They also recommend putting a "noindex" meta tag on each page to ensure nothing will get picked up. I think we have it taken care of at this point. We'll see
-
Arlene, I checked the link you offered but I could not locate the quote you offered anywhere on the page. I am sure it is referring to a different context. Using robots.txt as a blocking tool is fine BEFORE a site or page is indexed, but not after.
-
I used the removal tool and just entered a "/" which put in a request to have everything in all of my site's directories pulled from the index. And I have left "noindex" tags in place on every page. Hopefully this will get it done.
Thanks for your comments guys!
-
We blocked robots from accessing the site because Google told us to. This is straight from the webmaster tools help section:
Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site).
-
I have webmaster tools setup, but I don't see an option to remove the whole site. There is a URL removal tool, but there are over 700 pages I want pulled out of the index. Is there an option in webmaster tools to have the whole site pulled from the index?
-
Actually, since you have access to the site, you can leave the robots.txt at disallowed -- if you go into Google Webmaster Tools, verify your site, and request removal of your entire site. Let me know if you'd like a link on this with more information. This will involve adding an html file or meta tag to your site to verify you have ownership.
-
Thank you. Didn't realize we were shooting ourselves in the foot.
-
Hi Arlene.
The problem is that when you blocked the site with robots.txt, you are preventing Google from re-crawling your site so they cannot see the noindex tag. If you have properly placed the noindex tag on all the pages in your site, then modify your robots.txt file to allow Google to see your site. Once that happens Google will begin crawling your site and then be able to deindex your pages.
The only other suggestion is to submit a sitemap and/or remove the "nofollow" tag. With the nofollow tag on all your pages, Google may visit your site for a single page at a time since you are telling the crawler not to follow any links it finds. You are blocking it's normal discovery of your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does Google's search results display my home page instead of my target page?
Why does Google's search results display my home page instead of my target page?
Technical SEO | | h.hedayati6712365410 -
Paypal instead of Merchant's account and will the site still move up?
Hello, Will an Ecommerce site still move up in a niche if it only accepts PayPal and doesn't have a merchant's account on it? Thanks.
Technical SEO | | BobGW0 -
Godaddy and Soft 404's
Hello, We've found that a website we manage has a list of not-found URLS in Google webmaster tools which are "soft 404's " according to Google. I went to the hosting company GoDaddy to explain and to see what they could do. As far as I can see GoDaddy's server are responding with a 200 HTTP error code - meaning that the page exists and was served properly. They have sort of disowned this as their problem. Their server is not serving up a true 404 response. This is a WordPress site. 1) Has anyone seen this problem before with GoDaddy?Is it a GoDaddy problem?2) Do you know a way to sort this issue? When I use the command site:mydomain.co.uk the number of URLs indexed is about right except for 2 or 3 "soft URLs" . So I wonder why webmaster tools report so many yet I can't see them all in the index?
Technical SEO | | AL123al0 -
Bigcommerce only allows us to have https for our store only, not the other pages on our site, so we have a mix of https and http, how is this hurting us and what's the best way to fix?
So we aren't interested in paying a thousand dollars a month just to have https when we feel it's the only selling point of that package, so we have https for our store and the rest of the site blogs and all are http. I'm wondering if this would count as duplicate content or give us some other unforeseen penalty due to the half way approach of https being implemented. If this is hurting us, what would you recommend as a solution?
Technical SEO | | Deacyde0 -
Does Site Structure Affect Google
Hi - I'm pretty new at this. We’re running an e-commerce affiliate site at http://www.mydomain.com. So we don’t take payments but customer gets passed through to third party sites when they select to buy a product. We have a blog at http://www.mydomain.com/news. I think Google is treating these 2 sites as as separate sites for PR. For this reason we're thinking about moving this to http://news.mydomain.com. Anyone have any experience in this?
Technical SEO | | richardjoseph0 -
Https-pages still in the SERP's
Hi all, my problem is the following: our CMS (self-developed) produces https-versions of our "normal" web pages, which means duplicate content. Our it-department put the <noindex,nofollow>on the https pages, that was like 6 weeks ago.</noindex,nofollow> I check the number of indexed pages once a week and still see a lot of these https pages in the Google index. I know that I may hit different data center and that these numbers aren't 100% valid, but still... sometimes the number of indexed https even moves up. Any ideas/suggestions? Wait for a longer time? Or take the time and go to Webmaster Tools to kick them out of the index? Another question: for a nice query, one https page ranks No. 1. If I kick the page out of the index, do you think that the http page replaces the No. 1 position? Or will the ranking be lost? (sends some nice traffic :-))... thanx in advance 😉
Technical SEO | | accessKellyOCG0 -
I am Posting an article on my site and another site has asked to use the same article - Is this a duplicate content issue with google if i am the creator of the content and will it penalize our sites - or one more than the other??
I operate an ecommerce site for outdoor gear and was invited to guest post on a popular blog (not my site) for a trip i had been on. I wrote the aritcle for them and i also will post this same article on my website. Is this a dup content problem with google? and or the other site? Any Help. Also if i wanted to post this same article to 1 or 2 other blogs as long as they link back to me as the author of the article
Technical SEO | | isle_surf0 -
Site just will not be reincluded in Google's Index
I asked a question about this site (www.cookinggames.com.au) some time ago http://www.seomoz.org/qa/view/38488/site-indexing-google-doesnt-like-it and had some very helpful answers which were great. However I'm still no further ahead. I have added some more content, submitted a new XML sitemap, removed the 'lorem ipsum...' Now it seems that even Bing have ditched the site too. The number 1 result in Australia for the search term 'cooking games' is now this one - http://www.cookinggames.net.au/ which surely is not so much better to deserve a #1 spot whilst my site is deindexed? I have just had another reconsideration request 'denied' and am absolutely out of ideas/. If anyone can help suggest what I need to do... or even suggest how I can get feedback from the search engines what's wring that would be fantastic. Thank you David
Technical SEO | | OzDave0