Can I rely on just robots.txt

spiralsites

We have a test version of a clients web site on a separate server before it goes onto the live server.

Some code from the test site has some how managed to get Google to index the test site which isn't great!

Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?

ThompsonPaul

You can do the inbound link check right here using SEOMoz's Open Site Explorer tool to check for links to the dev site, whether it's in a subdomain, subfolder or a separate site.

Good luck!

Paul

spiralsites

thats a great help cheers

wheres the best place to do an inbound link check?

ThompsonPaul

You're actually up against a bit of a sticky wicket here, SS. You do need the no-index, no-follow meta tags on each page as Irving mentions.

HOWEVER! If you also add a robots.txt directive not to index the site, the search crawlers will not crawl your pages and therefore will never see the noindex metatag to know to remove the incorrectly-indexed pages from their index.

My recommendation is for a belt & suspenders approach.

implement the meta no-index, no-follow tags throughout the dev site, but do NOT immediately implement the robots.txt exclusion. Wait a day or two until the pages get recrawled and the bots discover the noindex metatags

Use the Remove URL tools in both Google and Bing Webmaster Tools to request removal of all the dev pages you are aware have been indexed.

Then add the exclusion directive to the robots.txt file to keep the crawlers out from then on (leaving the no-index, no-follow tags in place).

check back in the SERPS periodically to check that no other dev pages have been indexed. IF they have, do another manual removal request.

Does that make sense?

Paul

P.S. As a last measure, run an inbound links check on the dev pages that got indexed to find out which external pages are linking to the dev pages. Get those inbound links removed ASAP so the search engines aren't getting any signals to index the dev site. Last option would be to simply password-protect the directory the dev site is in. A little less convenient, but guaranteed to keep the crawlers out.

spiralsites

cheers, i thought as much

irvingw

You cannot rely on robots.txt alone, you need to add the meta noindex tag to the pages as well to ensure that they will not get indexed.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Can I rely on just robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt error

Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...

Do i have my robots.txt file set up properly

Should I block robots from URLs containing query strings?

Robots.txt not working?

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?

Invisible robots.txt?

Robots.txt and canonical tag