When You Add a Robots.txt file to a website to block certain URLs, do they disappear from Google's index?
-
I have seen several websites recently that have have far too many webpages indexed by Google, because for each blog post they publish, Google might index the following:
- www.mywebsite.com/blog/title-of-post
- www.mywebsite.com/blog/tag/tag1
- www.mywebsite.com/blog/tag/tag2
- www.mywebsite.com/blog/category/categoryA
- etc
My question is: if you add a robots.txt file that tells Google NOT to index pages in the "tag" and "category" folder, does that mean that the previously indexed pages will eventually disappear from Google's index? Or does it just mean that newly created pages won't get added to the index? Or does it mean nothing at all? thanks for any insight!
-
Hi William
If the pages in question are
- already indexed by Google then if you block them via the robots.txt , they will show up in search result but the meta description will say something along the lines of
A description for this result is not available because of this site's robots.txt – learn more.
2) not indexed by Google for example on a new site , they don't follow it and the pages does not come up in search directly BUT if some external sites link to the pages then they can still come up in the SERP some time down the track.
Your best bet to keep the page out of the public SERP index is the meta robots tag : http://www.robotstxt.org/meta.html
-
William, If the pages in question are linked to from external resources the robots.txt file will not prevent the pages from appearing in the index. Per Moz's Robots.txt and Meta Robots best practices, "the robots.txt tells the engines not to crawl the given URL, but that they may keep the page in the index and display it in in results.
To prevent all robots from indexing a page on your site, place the following meta tag into the section of your page:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make sure pages with similar content don't damage the other's SEO?
I work for a travel company and I have a 'tour page' targeted for pre-booking and a 'booking pack page' post-booking page, with some similar content but with details such as hostel locations, meeting places and times etc. I want to make sure the tour page keeps the authority as this is what I want to rank on SEO. I've got a couple of similar problems to this across site, there are a few pages on site that are post-sale and don't really need to rank on Google but it would be great if they could contribute to other pages' rankings. Thanks!
On-Page Optimization | | nicolewretham0 -
MOZ identifies duplicate titles - one has' www' in the title
MOZ has identified duplicate titles - one has' www' in the title. - we have a few pieces of content where the same thing is happening. Not sure how this has happened. Should we do something about this? Will it cause problems for ranking? | KETAMINE GUIDE FOR DRUG WORKERS - free | Harm reduction informationhttp://substance.org.uk/harm-reduction-information/ketamine-guide-for-drug-workers-free | 13 | 2 |
On-Page Optimization | | Substance-create
| KETAMINE GUIDE FOR DRUG WORKERS - free | Harm reduction informationhttp://www.substance.org.uk/harm-reduction-information/ketamine-guide-for-drug-workers-free | 13 | 4 | 1 - 2 of 20 -
Does Google penalize you for reindexing multiple URLS?
Hello, Just a quick, question! I was wanting to know if multiple page indexing (site overhaul) could cause a drop in organic traffic ranking or be penalized by Google for submitting multiple pages at one time. Thanks
On-Page Optimization | | InternetRep0 -
Timeline on Moz's About Page
There has been a lot of talk about improving “About” pages on websites as of late. Moz actually has a really interesting About page, which includes a timeline. Are there any recommended WordPress plugins that can achieve a similar timeline effect?
On-Page Optimization | | VicMarcusNWI0 -
PDF's - Dupe Content
Hi I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ? Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ? Cheers Dan
On-Page Optimization | | Dan-Lawrence0 -
Not using H1's with keywords to simulate natural non SEO'd content?
There has been a lot of talk lately about making a website seem like it is not SEO'd to avoid over optimization penalties with the recent Google Algorithmic updates. Has anyone come across the practice of not using Headings (H1's, H2's etc..) properly to simulate that the current webpage isn't over optimized? I've come across a site that used to use multiple keywords within their headings & now they are using none. In fact they are marking their company name & logo as an H1 and non keyworded H2's such as our work or Contact. Is anyone holding back on their old SEO tactics to not seem over optimized to Google? Thanks!
On-Page Optimization | | DCochrane0 -
New CMS system - 100,000 old urls - use robots.txt to block?
Hello. My website has recently switched to a new CMS system. Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls. Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical' Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find. My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary. Thanks!
On-Page Optimization | | Blenny0 -
Should I let Google index tags?
Should I let Google index tags? Positive? Negative Right now Google index every page, including tags... looks like I am risking to get duplicate content errors? If thats true should I just block /tag in robots.txt Also is it better to have as many pages indexed by google or it's should be as lees as possible and specific to the content as much as possible. Cheers
On-Page Optimization | | DiamondJewelryEmpire0