Meta NoIndex tag and Robots Disallow
-
Hi all,
I hope you can spend some time to answer my first of a few questions
We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS!
Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination).
After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
"There is no information about this page because it is blocked by robots.txt"So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt.
So coming to my question.
-
Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index?
-
Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index?
I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index.
Thanks!
B
-
-
There's no real way to estimate how long the re-crawl will take, Ben. You can get a bit of an idea by looking at the crawl rate reported in Google Webmaster Tools.
Yes, asking for a page fetch then submitting with linked pages for each of the main website sections can help speed up the crawl discovery. In addition, make sure you've submitted a current sitemap and it's getting found correctly (also reported in GWT) You should also do the same in Bing Webmaster Tools. Too many sites forget about optimizing for Bing - even if it's only 20% of Google's traffic, there's no point throwing it away.
Lastly, earning some new links to different sections of the site is another great signal. This can often be effectively & quickly done using social media - especially Google+ as it gets crawled very quickly.
As far as your other question - yes, once you get the unwanted URLs out of the index, you can add the robots.txt disallow back in to optimise your crawl budget. I would strongly recommend you leave the meta-robots no-index tag in place though as a "belt & suspenders" approach to keep pages linking into those unwanted pages from triggering a re-indexing. It's OK to have both in place as long as the de-indexing has already been accomplished, as we've discussed.
Hope that answer your questions?
Paul
-
So once Google has started to see the meta-noindex and is slowly deindexing pages, once that is done, I would like to block it from crawling them with a robots.txt to conserve my crawl budget.
But, there are still internal links on the site that point to these URL´s - would they get back into the index in this case?
-
Hi Paul,
Thank you for your detailed answer - so I'm not going crazy
I did try with canonicals but then realized they are more of a suggestion as opposed to a directive and I am still correcting a lot of dupe content and 404's so I am imagining that Google view's the site as "these guys don't know what they are doing' so may have ignored the canonical suggestion.
So what I have done is remove the robots block on the pages I want de-indexed and add in meta noindex, follow on these pages - From what you are saying, they should naturally de-index, after which, I will put the robots.txt block back on to keep my crawl budget spent on better areas of the site.
How long in your opinion can it take for Googlebot to de-index the pages? Can I help it along at all to speed up? Fetch page and linking pages as Googlebot?
Thanks again,
Ben
-
You're right to be confused, B. The terminology is unfortunate and misleading.
To answer your questions
1. Yes
2. Yes.
A disallow in robots.txt does nothing to remove already-indexed pages. That's not its purpose. Its only purpose is to tell the search crawlers not to waste their time crawling those pages. Even if pages have been blocked in robots, they will remain in the index if already there. Even if never crawled, and blocked in robots.txt, they can still end up indexed if some other indexed page links to them and the crawlers find those pages by following links. Again, nothing in a robots.txt disallow tells the engines to remove a page from the index, just not to waste time crawling it.
Put another way, the robots.txt disallow directive only disallows crawling - it says nothing about what to do if the page gets into the index in other ways.
The meta-robots no-index tag however explicitly states to the crawler "if you arrive at this page, do not add it to the index. If it is already in the index, remove it".
And yea - as you suspected - if pages are blocked in robots.txt, the crawler obeys and doesn't visit those pages So it can't discover the no-index command to drop them from the index. Thus the only way a page could get dropped is if a crawler followed a link from an external site and discovered the page that way. A very inefficient way of trying to get all those pages out of the index.
Bottom line - robots.txt is never the correct tool to deal with duplicate content issues. It's sole purpose is to keep the crawlers from wasting time on unimportant pages so they can spend more time finding (and therefore indexing) more important pages.
The three tools for dealing with duplicate content are meta-robots no-index tags in a page header, 301 redirects, and canonical tags. Which one to use depends on the architecture of your site, your intended purpose, and the site's technical limitations.
Hope that makes sense?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Looking for opinions on structuring meta title tags/page title/menu title/H1
Hi everyone I am hoping a few of you can share your opinions. I have been having conversations (okay, healthy debates) about how to write/structure meta title tag and how to compliment them with the H1, page title, menu name. To help explain the thought processes I will use a pretend keyword. How about "screwdriver". Case: (I made this up) we are redesigning a website for a construction tools manufacturing company (pretend name: ABC Tools) targeting OEMs who are interested in purchasing large quantities of tools. The product categories (to become main menu items) are Screwdrivers, Nails, Drills, and Hammers. (bear with me .... this is just an example I am making up on the fly) K. Circling back to screwdrivers - let's say we have one landing page (a primary category page and in the main menu) listing products and great details about screwdrivers. Focus keywords are screwdriver manufacturer, screwdriver supplier, construction screwdrivers Below are questions being debated. If you are willing ... how would you address these questions? And, can you explain WHY? QUESTION ONE: How would you structure the meta title tag (feel free to write one of your own) Screwdriver Manufacturer - Construction Screwdriver | ABC Tools ABC Tools - US-based Screwdriver Manufacturer Supplier Near You High-Quality Screwdrivers for Construction with ABC Tools QUESTION TWO: how would you write the H1 on the page? Would it match the meta tag? OR, would you write something different using the primary keyword? QUESTION THREE Remembering this is not a blog post ... it is a primary landing page linked to the main navigation. What would the menu title be? (remember the product categories above are how the main menu items are bucketed) Screwdrivers Screwdriver Manufacturer Typically in WordPress, the H1 and the menu title is auto-populated using the page title (not the title tag)... So, if we use Screwdrivers as the page title but we want the H1 to match the meta title tag, would we manually change the H1? Or, have the page title and title tag match, but manually change the menu item?
Intermediate & Advanced SEO | | Brenda.Haines1 -
Should I add no-follow tags to my widget links?
Matt Cutts recommended in a video in 2013 to add rel="nofollow" on widget links that link back to your website. Some background of my company: We're a software company for website chat. There's a 'powered by' link in our widgets that links back from our users' websites to our website. Currently these are all follow links. I checked out the links of our competitors, and it seems none of them have no follow on their widget backlinks. This, together with the fact that the video is quite old and information on this issue rather scarce, makes me doubt whether we should change our widget backlinks to no follow. Does anyone have thoughts on this?
Intermediate & Advanced SEO | | Maximuxxx0 -
Mobile Meta Descriptions
Hi we have a e-commerce site on Magento. A lot of the current current meta descriptions are over 120 characters, which is approximately what Google cuts off for mobile search. We want to create mobile meta descriptions but where would we add them to the CMS and how do we tell Google to use the mobile meta description when the site is responsive. Any suggestions would be very much appreciated! Thanks, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Meta NOINDEX... how long before Google drops dupe pages?
Hi, I have a lot of near dupe content caused by URL params - so I have applied: How long will it take for this to take effect? It's been over a week now, I have done some removal with GWT removal tool, but still no major indexed pages dropped. Any ideas? Thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Files blocked in robot.txt and seo
I use joomla and I have blocked the following in my robots.txt is there anything that is bad for seo ? User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Disallow: /xmlrpc/ Disallow: /mailto:myemail@myemail.com/ Disallow: /javascript:void(0) Disallow: /.pdf
Intermediate & Advanced SEO | | seoanalytics0 -
Why is Alt tag occurances so high in the on page reporter?
I am trying to understand why the on page reporter shows so many occurrences of alt tags containing my keywords. Can anyone shed any light? This is the URL that's in question. http://www.towelsrus.co.uk/towels-hand-towels/aztex/turkish-cotton-hand-towel_ct472bd182pd2745.htm yMlM5.png
Intermediate & Advanced SEO | | Towelsrus0 -
Why should I add URL parameters where Meta Robots NOINDEX available?
Today, I have checked Bing webmaster tools and come to know about Ignore URL parameters. Bing webmaster tools shows me certain parameters for URLs where I have added META Robots with NOINDEX FOLLOW syntax. I can see canopy_search_fabric parameter in suggested section. It's due to following kind or URLs. http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1728 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1729 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1730 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=2239 But, I have added META Robots NOINDEX Follow to disallow crawling. So, why should it happen?
Intermediate & Advanced SEO | | CommercePundit0 -
Can you use more than one meta robots tag per page?
If you want to add both "noindex, follow" and "noopd" should you add two meta robots tags or is there a way to combine both into one?
Intermediate & Advanced SEO | | nicole.healthline0