Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
-
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google.
Our developer has told us that these urls are created by a module and are not "real" pages in the CMS.
They would like to add the following to our robots.txt file
Disallow: /catalog/product/gallery/
QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index?
We don't want these pages to be found.
-
That's why I mentioned: "eventually". But thanks for the added information. Hopefully it's clear now for the original poster.
-
Looking at this video - https://www.youtube.com/watch?v=KBdEwpRQRD0&feature=youtu.be Matt Cutts advises to use the noindex tag on every individual page. However, this is very time consuming if you're dealing wit a large volume of pages.
The other option he recommends is to use the robots.txt file as well as the URL removal tool in GWMT, Although this is the second choice option, it does seem easier for us to implement than the noindex tag.
-
Hi,
Yes, if you put any url in the robots.txt it will not be shown in the search results after some time even if your pages were already indexed. Because when your disallow urls in the robots.txt , Google will stop crawling that page and eventually will stop indexing those pages.
-
Hi Nico
Great response thanks.
This is certainly something I'm taking into consideration and will question my developer about this.
-
Thanks Thomas.
I'm now finding out from my developer is we are able to noindex these pages with the meta robots.
If this is something that isn't possible, it's likely that we'll add to the robots.txt as you did.
Either way I think will be progress to different degrees.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
Completely agree, I have done the same for a website I am doing work with, ideally we would noindex with meta robots however that isn't possible. So instead we added to the robots.txt, the number of indexed pages have dropped, yet when you search exactly it just says the description can't be reached.
So I was happy with the results as they're now not ranking for the terms they were.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
In September 2015 I catapulted a website from ~3.000 to 130.000 indexed pages (roughly). 127.000 were essentially canonicalised duplicates (yes, it did make sense) but also blocked by robots.txt - but put into the index nonetheless. The problem was a dynamically generated parameter, always different, always blocked by robots.
The title was equal to the link text; the description became "A description for this result is not available because of this site's robots.txt – learn more." (If Google cannot crawl a URL Google will usually take titles from links pointing to that URL). No sign of disappearing. In fact, Google was happy to add more and more to its index ...
At the start of December 2015 I removed the robots.txt block - Google could now read the canonicals or noindex on the URLs ... the pages only began dropping out, slowly and in bunches of a few thousand in March 2016 - probably due to the very low relevancy and crawl budget assigned to them. Right now there are still about 24.000 pages in the index.
So my answer would be: No - disabling crawling in the robots.txt will NOT remove a page from the index. For that you need to noindex them (which sometimes also works if done in robots.txt, I've heard). Disallowing URLs in the robots.txt will very likely drop pages to the end of useful results, though, as Andy described. (I don't know if this has any influence on the general evaluation of the site as a whole; I'd guess not.)
Regards
Nico
-
Thanks Martijn. This is what I was assuming would happen. However, I got a confusing message from my developer which said the following,
"won't remove the URL's from the index but it will mean that they will only show up for very specific searches that customers are extremely unlikely to use. It will also increase Asgard's crawl budget as Google and Bing won't try to crawl these URLs. Would you be happy with this solution?"
I would tend to still agree with your statement though.
-
Yes they will be eventually. As you disallow Google to crawl the URLs it will probably start hiding the descriptions for some of these image pages soon as they can't crawl them anymore. Then at some point they'll stop looking at them at all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can 'follow' rather than 'nofollow' links be damaging partner's SEO
Hey guys and happy Monday! We run a content rich website, 12+ years old, focused on travel in a specific region, and advertisers pay for banners/content etc alongside editorial. We have never used 'nofollow' website links as they're no explicitly paid for by clients, but a partner has asked us to make all links to them 'nofollow' as they have stated the way we currently link is damaging their SEO. Could this be true in any way? I'm only assuming it would adversely affect them if our website was peanalized by Google for 'selling links', which we're not. Perhaps they're just keen to follow best practice for fear of being seen to be buying links. FYI we now plan to change to more full use of 'nofollow', but I'm trying to work out what the client is refering to without seeming ill-informed on the subject! Thank you for any advice 🙂
Intermediate & Advanced SEO | | SEO_Jim0 -
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
Is it a good or bad idea (in Google's eyes) to add a forum to my website?
I have an active website with many users adding dozens of comments on the many pages of the site daily. I'm am wondering if it would be good for the overall ranking strength of the site if I were to add a forum to it (in a subdirectory, like forum.mysite.com). On one hand, I can see the forum posts as thin content, which Google wouldn't care for. On the other hand, I see the additional user engagement on the site, which I think Google would like. I know the benefits it can have to the users, but for this question, all I want to know is if this would be seen by Google as a plus or a minus for my site, assuming the forum succeeded in becoming popular. I don't want to do anything that will diminish the value of my site in Google's eyes. Thank you.
Intermediate & Advanced SEO | | bizzer0 -
Will these 301's get me penalized?
Hey everyone, We're redesigning parts of our site and I have a tricky question that I was hoping to get some sound advice about. We have a blog (magazine) with subcategory pages that are quite thin. We are going to restructure the blog (magazine) and feature different concert and have new subcategories. So we are trying to decide where to redirect the existing subcategory pages, e.g. Entertainment, Music, Sports, etc. www.charged.fm/magazine Our new ticket category pages ( Concert Tickets, NY Yankees Tickets, OKC Thunder Tickets, etc) are going to feature a tab called 'Latest News' where we are thinking of 301 redirecting the old magazine subcategory pages. So Sports News from the blog would 301 to Sports Tickets (# Latest News tab). See screenshot below for example. So my question is: Will this look bad in the eyes of the GOOG? Are these closely related enough to redirect? Are there any blatant pitfalls that I'm not seeing? It seems like a win/win because we are making a rich Performer page with News, Bio, Tickets and Schedule and getting to reallocate the link juice that was being wasted in an pretty much useless page that was allowed to become to powerful. Gotta keep those pages in check! Thoughts appreciated. Luke Cn6HPpH.jpg
Intermediate & Advanced SEO | | keL.A.xT.o0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Don't affiliate programs have an unfair impact on a company's ability to compete with bigger businesses?
So many coupon sites and other websites these days will only link to your website if you have a relationship with Commission Junction or one of the other large affiliate networks. It seems to me that links on these sites are really unfair as they allow businesses with deep pockets to acquire links unequitably. To me it seems like these are "paid links", as the average website cannot afford the cost of running an affiliate program. Even worse, the only reason why these businesses are earning a link is because they have an affiliate program; that to me should violate some sort of Google rule about types and values of links. The existence of an affiliate program as the only reason for earning a link is preposterous. It's just as bad as paid link directories that have no editorial standards. I realize the affiliate links are wrapped in CJ's code, so that mush diminish the value of the link, but there is still tons of good value in having the brand linked to from these high authority sites.
Intermediate & Advanced SEO | | williamelward0 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0 -
URL Length or Exact Breadcrumb Navigation URL? What's More Important
Basically my question is as follows, what's better: www.romancingdiamonds.com/gemstone-rings/amethyst-rings/purple-amethyst-ring-14k-white-gold (this would fully match the breadcrumbs). or www.romancingdiamonds.com/amethyst-rings/purple-amethyst-ring-14k-white-gold (cutting out the first level folder to keep the url shorter and the important keywords are closer to the root domain). In this question http://www.seomoz.org/qa/discuss/37982/url-length-vs-url-keywords I was consulted to drop a folder in my url because it may be to long. That's why I'm hesitant to keep the bradcrumb structure the same. To the best of your knowldege do you think it's best to drop a folder in the URL to keep it shorter and sweeter, or to have a longer URL and have it match the breadcrumb structure? Please advise, Shawn
Intermediate & Advanced SEO | | Romancing0