Is robots met tag a more reliable than robots.txt at preventing indexing by Google?

McTaggart

What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing?

I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution.

Thanks in advance, Luke

Bobbi_Tschumper

Hi there,

Regarding the X-Robots tag. We have had a couple of sites that were disallowed in the robots.txt have their PDF, Doc etc files get indexed. I understand the reasoning for this. I would like to remove the disallow in the robots.txt and use the X-robots tag to noindex all pages as well as PDF, Doc files etc. This is for a ngnix configuation. Does anyone know what the written x-robots tag would look like in this case?

BlueprintMarketing

Test for what works for your site.

Use tools below

https://www.deepcrawl.com/ (will give you one free full crawl)
https://www.screamingfrog.co.uk/seo-spider/ (free up to 500 URLs)
http://urlprofiler.com/ (14 days free try)

So much info

https://www.deepcrawl.com/blog/tag/robots-txt/

Thomas

BlueprintMarketing

Hi Luke,

In order to exclude individual pages from search engine indices, the noindex meta tag

is actually superior to robots.txt.

But X-Robots-Tag header tag is the best but much hader to use.

Block all web crawlers from all content

User-agent: *
Disallow: /

Using the robots.txt file, you can tell a spider where it cannot go on your site. You can not tell a search engine which URLs it cannot show in the search results. This means that not allowing a search engine to crawl an URL – called “blocking” it – does not mean that URL will not show up in the search results. If the search engine finds enough links to that URL, it will include it; it will just not know what’s on that page.

If you want to reliably block a page from showing up in the search results, you need to use a meta robots noindex tag. That means the search engine has to be able to index that page and find the noindex tag, so the page should not be blocked by robots.txt

a robots.txt file does. In a nutshell, what it does is tell search engines to not crawl a particular page, file or directory of your website.

Using this, helps both you and search engines such as Google. By not providing access to certain, unimportant areas of your website, you can save on your crawl budget and reduce load on your server.

Please note that using the robots.txt file to hide your entire website for search engines is definitely not recommended.

see big photo: http://i.imgur.com/MM7hM4g.png


_(…)_

_(…)_

The robots meta tag in the above example instructs all search engines not to show the page in search results. The value of the name attribute (robots) specifies that the directive applies to all crawlers. To address a specific crawler, replace the robots value of the name attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user-agents (a crawler uses its user-agent to request a page.) Google's standard web crawler has the user-agent name. Googlebot To prevent only Googlebot from crawling your page, update the tag as follows:

This tag now instructs Google (but no other search engines) not to show this page in its web search results. Both the and name the attributescontent are non-case sensitive.

Search engines may have different crawlers for different properties or purposes. See the complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:

If you need to specify multiple crawlers individually, it's okay to use multiple robots meta tags:

If competing directives are encountered by our crawlers we will use the most restrictive directive we find.

irective. This basically means that if you want to really hide something from the search engines, and thus from people using search, robots.txt won’t suffice.

Indexer directives

Indexer directives are directives that are set on a per page and/or per element basis. Up until July 2007, there were two directives: the microformat rel=”nofollow”, which means that that link should not pass authority / PageRank, and the Meta Robots tag.

With the Meta Robots tag, you can really prevent search engines from showing pages you want to keep out of the search results. The same result can be achieved with the X-Robots-Tag HTTP header. As described earlier, the X-Robots-Tag gives you more flexibility by also allowing you to control how specific file(types) are indexed.

Example uses of the X-Robots-Tag

Using the `X-Robots-Tag` HTTP header

The X-Robots-Tag can be used as an element of the HTTP header response for a given URL. Any directive that can be used in an robots meta tag can also be specified as an X-Robots-Tag. Here's an example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
_(…)_
**X-Robots-Tag: noindex**
_(…)_

Multiple X-Robots-Tag headers can be combined within the HTTP response, or you can specify a comma-separated list of directives. Here's an example of an HTTP header response which has a noarchive X-Robots-Tag combined with an unavailable_after X-Robots-Tag.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
_(…)_
**X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST**
_(…)_

The X-Robots-Tag may optionally specify a user-agent before the directives. For instance, the following set of X-Robots-Tag HTTP headers can be used to conditionally allow showing of a page in search results for different search engines:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
_(…)_
**X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow**
_(…)_

Directives specified without a user-agent are valid for all crawlers. The section below demonstrates how to handle combined directives. Both the name and the specified values are not case sensitive.

I hope this helps,

Tom

MM7hM4g.png CfQwhBq.png lock-environment.png

GWMSEO

If you've recently added the "noindex" meta, get the page fetched in GWT. Google can't act if it doesn't see the tag.

LoganRay

Hi Luke,

It's a pretty common misconception that the robots.txt will prevent indexing. It's only purpose is actually to prevent crawling, anything disallowed in there is still up for indexing if it's linked to elsewhere. If you want something deindexed, your best bet is the robots meta tag, but make sure you allow crawling of the URLs to give search engine bots an opportunity to see the tag.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.