My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.

BVREID

My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Are there any other precautions I should be taking? Please advise.

Vuly

On your beta sites in future, I would recommend using Basic HTTP Authentication so that spiders can't even access it (this is for Apache):

AuthUserFile /var/www/sites/passwdfile
AuthName "Beta Realm"
AuthType Basic
require valid-user
Then htpasswd -m /var/www/sites/passwdfile username

If you do this as well, Google's Removal Tool will go "ok its not there I should remove the page" as well, because they usually ask for content in the page as a check for removal. If you don't remove the text, they MAY not process the removal request (even if it has noindex [though I don't know if that's the case]).

pikka

In Webmaster Tools, set the subdomain up as its own site and verify it
Put on the robots.txt for the subdomain (beta.website.com/robots.txt

User-agent: *
Disallow: /
You can then submit this site for removal in Google Webmaster Tools

Click "optimization" and then "remove URLs"
Click "create a new removal request"
Type the URL "http://beta.website.com/" in there
Click "continue"
Click "submit request".

danatanseo

Agreed on all counts with Mark. In addition, if you haven't done this already, make sure you have canonical tags in place on your pages. Good luck!

Mark_Ginsberg

You can add noindex to the whole subdomain, and then wait for the crawlers to remove it.

Or you can register the subdomain with webmaster tools, block the subdomain via the robots.txt with a general Disallow: / for the entire subdomain, and then use the URL removal tool in Webmaster Tools to remove the subdomain via robots.txt. Just a robots.txt block won't work - it won't remove the pages, it'll just prevent them from being crawled again.

In your case, I would probably go the route of the robots.txt / url removal tool. This will work to remove the pages from Google. Once this has happened, I would use the noindex tag on the whole subdomain and remove the robots.txt block - this way, all search engines should not index the page / will remove it from their index.

Mark

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

My website's pages are not being indexed correctly

Google has deindexed 40% of my site because it's having problems crawling it

Sitemap issue? 404's & 500's are regenerating?

What is the best way to stop a page being indexed?

What's our easiest, quickest "win" for page load speed?

What is the best way to deal with pages whose content changes?

I have a site that has both http:// and https:// versions indexed, e.g. https://www.homepage.com/ and http://www.homepage.com/. How do I de-index the https// versions without losing the link juice that is going to the https://homepage.com/ pages?

What's the website that analyzes all local business submissions?