Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I add my html sitemap to Robots?
I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo0 -
Site's meta description is not being shown in Google Search results. Instead our privacy policy is getting indexed.
We re-launched our new site and put in the re-directs. Our site is https://www.fico.com/en. When I search for "fico" in Google. I see the privacy policy getting indexed as meta descriptions instead of our actual meta description. I have edited the meta description, requested Google to re-index our site. Not sure what to do next? Thanks for your advise.
Technical SEO | | gosheen0 -
Problems with Meta Title on Bing
On the Bing search engine, it isn't showing the actual meta title we have for a website. It's showing something different. However, the correct meta title is showing on the Google search engine. Has anyone had the same issue? Has anyone been able to fix this issue? Thanks for your help!
Technical SEO | | Harrison.Stickboy0 -
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
Google is not respecting the meta title
We're experiencing a peculiar situation with Google not respecting our meta <title>.</p> <p>As you can see in the first image (search result), the title <a href="http://open.iebschool.com/profesores/startups/">for the page</a> is a part of the content. This is relatevely normal for the description, but we never heard of Google doing this before.</p> <p>In the code, the <title> and meta description are correctly implemented.</p> <blockquote style="background-color: #f7f7f7; padding-top: 5px; margin-left: 0px; padding-left: 2px; padding-bottom: 5px; white-space: nowrap; overflow-y: auto; font-family: monospace; background-position: initial initial; background-repeat: initial initial;"> <p><meta name="description" content="Profesores, tutores, autores y docentes 2.0 de Open IEBS. Conoce su Biografía, experiencia, reputación, conexiones sociales y las valoraciones de alumnos."/><br /><title>Conoce los profesores, tutores, autores y docentes de Open IEBS.</title> In a further research, we discovered that the title which is using is an in anwith the following code (cleaned and simplified for the question): <hgroup> Pilar Soro
Technical SEO | | ofuente
0 Seguidor
Para poder seguir al Profesor, debes de registrarte aquí. Profesora y experta en redes sociales. Formadora de docentes, [...]
</hgroup> Note: we're correcting the code since this is quite messy, but it's the one we have now The point is that google has considered that this particular is more important than the title itself. This would make sense if we were looking for that name, but the search was simply "site:domain.com". Two things for which this is even more strange are the following: while all the /profesor/%category%/ has the same code, this only happens in some search results and not in all of them; why is it appearing in some pages, but respecting my title in others? the previous code is not the only one in the page, there are about 10 others and some are placed before and some are placed after; so, why this one and not the first or the last? What is more strange is why this article in particular and not any other of the 10 on the page since some of them are placed before and some of them are placed after. Provided this situation, we would like to know: is this a common situation? Is it happening to more people? why is it happening? Is it somehow related to , <hgroup>and ? why that piece of code and not any other article? and why is it only happening in some pages? more important, can it be corrected or can we take advantage of it somehow? Thank you in advance. Any light you can shed on this will be well received! AJ2CUSe.png?1?8232 </hgroup>0 -
Meta Description,Title
If I changed the meta description and title of the post from the existing one how long will it may take to get indexed in Google. How can I fasten the process of indexing the changed meta description and title. Thanks, Venkee.
Technical SEO | | Venkee0 -
Robots.txt for subdomain
Hi there Mozzers! I have a subdomain with duplicate content and I'd like to remove these pages from the mighty Google index. The problem is: the website is build in Drupal and this subdomain does not have it's own robots.txt. So I want to ask you how to disallow and noindex this subdomain. Is it possible to add this to the root robots.txt: User-agent: *
Technical SEO | | Partouter
Disallow: /subdomain.root.nl/ User-agent: Googlebot
Noindex: /subdomain.root.nl/ Thank you in advance! Partouter0 -
Meta data in includes: not ideal or a problem?
I have pages with meta data being pulled in via an include. This was to prevent people from touching the pages themselves. Is this an optimization issue- or is it OK to do?
Technical SEO | | Tribeca-Marketing-Group0