Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
-
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google.
Our developer has told us that these urls are created by a module and are not "real" pages in the CMS.
They would like to add the following to our robots.txt file
Disallow: /catalog/product/gallery/
QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index?
We don't want these pages to be found.
-
That's why I mentioned: "eventually". But thanks for the added information. Hopefully it's clear now for the original poster.
-
Looking at this video - https://www.youtube.com/watch?v=KBdEwpRQRD0&feature=youtu.be Matt Cutts advises to use the noindex tag on every individual page. However, this is very time consuming if you're dealing wit a large volume of pages.
The other option he recommends is to use the robots.txt file as well as the URL removal tool in GWMT, Although this is the second choice option, it does seem easier for us to implement than the noindex tag.
-
Hi,
Yes, if you put any url in the robots.txt it will not be shown in the search results after some time even if your pages were already indexed. Because when your disallow urls in the robots.txt , Google will stop crawling that page and eventually will stop indexing those pages.
-
Hi Nico
Great response thanks.
This is certainly something I'm taking into consideration and will question my developer about this.
-
Thanks Thomas.
I'm now finding out from my developer is we are able to noindex these pages with the meta robots.
If this is something that isn't possible, it's likely that we'll add to the robots.txt as you did.
Either way I think will be progress to different degrees.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
Completely agree, I have done the same for a website I am doing work with, ideally we would noindex with meta robots however that isn't possible. So instead we added to the robots.txt, the number of indexed pages have dropped, yet when you search exactly it just says the description can't be reached.
So I was happy with the results as they're now not ranking for the terms they were.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
In September 2015 I catapulted a website from ~3.000 to 130.000 indexed pages (roughly). 127.000 were essentially canonicalised duplicates (yes, it did make sense) but also blocked by robots.txt - but put into the index nonetheless. The problem was a dynamically generated parameter, always different, always blocked by robots.
The title was equal to the link text; the description became "A description for this result is not available because of this site's robots.txt – learn more." (If Google cannot crawl a URL Google will usually take titles from links pointing to that URL). No sign of disappearing. In fact, Google was happy to add more and more to its index ...
At the start of December 2015 I removed the robots.txt block - Google could now read the canonicals or noindex on the URLs ... the pages only began dropping out, slowly and in bunches of a few thousand in March 2016 - probably due to the very low relevancy and crawl budget assigned to them. Right now there are still about 24.000 pages in the index.
So my answer would be: No - disabling crawling in the robots.txt will NOT remove a page from the index. For that you need to noindex them (which sometimes also works if done in robots.txt, I've heard). Disallowing URLs in the robots.txt will very likely drop pages to the end of useful results, though, as Andy described. (I don't know if this has any influence on the general evaluation of the site as a whole; I'd guess not.)
Regards
Nico
-
Thanks Martijn. This is what I was assuming would happen. However, I got a confusing message from my developer which said the following,
"won't remove the URL's from the index but it will mean that they will only show up for very specific searches that customers are extremely unlikely to use. It will also increase Asgard's crawl budget as Google and Bing won't try to crawl these URLs. Would you be happy with this solution?"
I would tend to still agree with your statement though.
-
Yes they will be eventually. As you disallow Google to crawl the URLs it will probably start hiding the descriptions for some of these image pages soon as they can't crawl them anymore. Then at some point they'll stop looking at them at all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting Google to index our sitemap
Hi, We have a sitemap on AWS that is retrievable via a url that looks like ours http://sitemap.shipindex.org/sitemap.xml. We have notified Google it exists and it found our 700k urls (we are a database of ship citations with unique urls). However, it will not index them. It has been weeks and nothing. The weird part is that it did do some of them before, it said so, about 26k. Then it said 0. Now that I have redone the sitemap, I can't get google to look at it and I have no idea why. This is really important to us, as we want not just general keywords to find our front page, but we also want specific ship names to show links to us in results. Does anyone have any clues as to how to get Google's attention and index our sitemap? Or even just crawl more of our site? It has done 35k pages crawling, but stopped.
Intermediate & Advanced SEO | | shipindex0 -
Robots.txt Help
I need help to create robots.txt file. Please let me know what to add in the file. any real example or working example.?
Intermediate & Advanced SEO | | Michael.Leonard0 -
Is 1:1 301 redirect required on indexed URL when restructing URL even if the new URL is canonicalized?
Hello folks, We are restructuring some URLS which forms a fair chunk of the content of the domain.
Intermediate & Advanced SEO | | HB17
These content are auto generated rather than manually created unlike other parts of the website. The same content is currently accessible from two URLs: /used-books/autobiography-a-long-walk-to-freedom-isbn
/autobiography/used-books/a-long-walk-to-freedom-isbn The URL 1 uses the URL 2 as the canonical url and it has worked allright since Moz does
not show the two as duplicate of each other. Google has also indexed the canonical URL although
there is still a few 'URL 1s' which were indexed before the canonical was implemented. The updated URL structure will look like something like this: /used-books/autobiography-a-long-walk-to-freedom-author-name-isbn
/autobiography/used-books/a-long-walk-to-freedom-authore-name-isbn It would be great to have just a single URL but a few business requirement prevents
us from having just the canonical URL only even with the new structure. Since we will still have two URLs to access the same content and we were wondering
whether we will need to do a 1:1 301 redirect on the current URLs or since there will be canonical URL
(/autobiography/used-books/a-long-walk-to-freedom-authore-name-isbn),
we won't need to worry about doing the 1:1 redirect on the the indexed content? Please note that the content will still be accessible from the OLD URL (unless 301ed of course). If it is advisable to do a 1:1 301 redirect this is what we intend to do: /used-books/autobiography-a-long-walk-to-freedom-isbn 301 to
/used-books/autobiography-a-long-walk-to-freedom-author-name-isbn /autobiography/used-books/a-long-walk-to-freedom-isbn 301 to
/autobiography/used-books/a-long-walk-to-freedom-authore-name-isbn Any advice/suggestions would be greated appreciated. Thank you.0 -
Robots.txt vs noindex
I recently started working on a site that has thousands of member pages that are currently robots.txt'd out. Most pages of the site have 1 to 6 links to these member pages, accumulating into what I regard as something of link juice cul-d-sac. The pages themselves have little to no unique content or other relevant search play and for other reasons still want them kept out of search. Wouldn't it be better to "noindex, follow" these pages and remove the robots.txt block from this url type? At least that way Google could crawl these pages and pass the link juice on to still other pages vs flushing it into a black hole. BTW, the site is currently dealing with a hit from Panda 4.0 last month. Thanks! Best... Darcy
Intermediate & Advanced SEO | | 945010 -
How to get a site out of Google's Sandbox
Hi I am working on a website that is ranking well in bing for the domain name / exact url search but appears no where in Google or Yahoo. I have done the site search in Google and it is indexed so I am presuming it is in the sandbox. The website was originally developed in India and I do not know whether it had some history of bad backlinks. The website itself is well optimised and I have checked all pages in Moz - getting a grade A. Webmaster Tools is not showing any manual actions - I was wondering what I could do next?
Intermediate & Advanced SEO | | AllieMc0 -
Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
We have ~3,000 photos that have all been tagged. We have a wonderful AJAXy interface for users where they can toggle all of these tags to find the exact set of photos they're looking for very quickly. We've also optimized a site structure for Google's benefit that gives each category a page. Each category page links to applicable album pages. Each album page links to individual photo pages. All pages have a good chunk of unique text. Now, for Google, the domain.com/photos index page should be a directory of sorts that links to each category page. Alternatively, the user would probably prefer the AJAXy interface. What is the best way to execute this?
Intermediate & Advanced SEO | | tatermarketing0 -
Where's all the text?
Hi, We recently (yesterday) had a developer make a new site for us on Wix http://www.appointeddhq.com/ as the one we were planning to put up had a few teething issues (the beackend booking system wasn't ready and we needed something up immediately for a TV show we were being featured in). Having now had the chance to look through it, I'm not quite sure what's going on. None of the text appears to be there on any page, I can't find any of the descriptions we gave the developer, the alt tags behind pictures (and even the pics themselves) don't appear to be there, the URLs are messed up, titles are incorrect and there are no title tags to be found. Am I misunderstanding or is the whole site built in java? Obviously, this is quite a huge issue and I'll want to get it sorted immediately, but I thought it best to see what the good folks here though. Thanks!
Intermediate & Advanced SEO | | LeahHutcheon0 -
Adding index.php at the end of the url effect it's rankings
I have just had my site updated and we have put index.php at the end of all the urls. Not long after the sites rankings dropped. Checking the backlinks, they all go to (example) http://www.website.com and not http://www.website.com/index.php. So could this change have effected rankings even though it redirects to the new url?
Intermediate & Advanced SEO | | authoritysitebuilder0