Pages getting into Google Index, blocked by Robots.txt??
-
Hi all,
So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists=So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more."
So we removed them all, and google removed them all, every single one.
This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index??
I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages?
Any ideas?
thanks.
-
Oh, ok. If that's the case, pls don't worry about those in the index. You can get them removed using remove URL feature in webmaster tools account.
-
It doesn't show any result for the "blocked page" when I do that in Google.
-
Hi,
Please try this and let us know the results:
Suppose this is one of the pages in discussion:
http://www.yourdomain.com/blocked-page.html
Go to Google, type the following along with double quotes. Replace with the actual page:
"yourdomain.com/blocked-page.html" -site:yourdomain.com
-
Hi!
From what I could tell, it wasn't that many pages already in the index, so it could be worth trying to lift the block, at least for a short while, to see if it will have an impact.
In addition - how about configuring how GoogleBot should threat your URLs via the URL parameter tool in Google Webmaster Tools. Here's what Google has to say about this. https://support.google.com/webmasters/answer/1235687
Best regards,Anders
-
Hi Devanur.
What I'm guessing is the problem here, is that as of now, GoogleBot is restricted from accessing the pages (because of robots.txt), leading to it never going into the page and updateing its index regarding the "noindex, follow" declaration in the that seems to be in place.
One other thing that could be considered, is to add "rel=nofollow" to all the faceted navigation links on the left.
Fully agreeing with you on the "crawl budget" part
Anders
-
Hi guys,
Appreciate your replies, but as far as I checked last time, if the URL is blocked by a Robots.txt file, it cannot read the Meta Noindex, Follow tag within the page.
There are no external references to these URL's, so Google is finding them within the site itself.
In essence, what you are recommending is that I lift the robots block and let google crawl these pages (which could be infinite as it is faceted navigation).
This will waste my crawl budget.
Any other ideas?
-
Anderss has pointed out to the right article. With robots.txt blocking, Google bot will not do the crawl (link discovery) from within the website but what if references to these blocked pages are found else where on third-party websites? This is the case you have been into. So to fully block Google from doing the link discovery and indexing these blocked pages, you should go in for the page-level meta robots tag to block these pages. Once this is in place, this issue will fade away.
This issue has been addressed many times here on Moz.
Coming to your concern about the crawl budget. There is nothing to worry about this as Google will not crawl those blocked pages while its on your website as these are already been blocked using robots.txt file.
Hope it helps my friend.
Best regards,
Devanur Rafi
-
Hi!
It could be that that pages has already been indexed before you added the directives to robots.txt.
I see that you have added the rel=canonical for the pages and that you now have noindex,follow. Is that recently added? If so, it could be wise to actually let GoogleBot access and crawl the pages again - and then they'll go away after a while. Then you could add the directive again later. See https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466 for more about this.
Hope this helps!
Anders -
For example:
http://www.sekretza.com/eng/best-sellers-sekretza-products.html?price=1%2C1000Is blocked by using:
Disallow: /*price=.... ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting Authority Social Blogs to Index
We have a few authority blogs that I manage to help increase our brand awareness and build power to our website. We have Blogspot, Wordpress, Tumblr & Typepad. Our content get's a summary syndicated to our authority blogs with an attribution link back to the original post. I also manually check them one a month to make sure it looks good and the content syndicated correctly. I even add unique content to these blogs once in awhile. I recently realized that the majority of the pages are not indexing. I added the blogs to our GSC & Bing webmasters and submitted the sitemaps. This was done on December 11th, as of now some pages indexed in Google and Bing says the sitemaps are still pending... Blogspot - 32/100 pages indexed Wordpress - 34/81 pages indexed Tumblr - 4/223 pages indexed Typepad - 3/63 pages indexed Can anyone help me figure out why I can't get Google to index more pages or Bing to process the sitemaps timely?
Intermediate & Advanced SEO | | LindsayE1 -
No-index pages with duplicate content?
Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help!
Intermediate & Advanced SEO | | EndeR-0 -
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up. I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove. Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
Intermediate & Advanced SEO | | sparrowdog0 -
New Web Page Not Indexed
Quick question with probably a straightforward answer... We created a new page on our site 4 days ago, it was in fact a mini-site page though I don't think that makes a difference... To date, the page is not indexed and when I use 'Fetch as Google' in WT I get a 'Not Found' fetch status... I have also used the'Submit URL' in WT which seemed to work ok... We have even resorted to 'pinging' using Pinglar and Ping-O-Matic though we have done this cautiously! I know social media is probably the answer but we have been trying to hold back on that tactic as the page relates to a product that hasn't quite launched yet and we do not want to cause any issues with the vendor! That said, I think we might have to look at sharing the page socially unless anyone has any other ideas? Many thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
Can Google index PDFs with flash?
Does anyone know if Google can index PDF with Flash embedded? I would assume that the regular flash recommendations are still valid, even when embedded in another document. I would assume there is a list of the filetype and version which Google can index with the search appliance, but was not able to find any. Does anyone have a link or a list?
Intermediate & Advanced SEO | | andreas.wpv0 -
Getting individual website pages to rank for their targeted terms instead of just the home page
Hi Everyone, There is a pattern which I have noticed when trying to get individual pages to rank for the allocated targeted terms when I execute an SEO campaign and would been keen on anyones thoughts on how they have effectively addressed this. Let me try and explain this by going through an example: Let's say I am a business coach and already have a website where it includes several of my different coaching services. Now for this SEO campaign, I'm looking to improve exposure for the clients "business coaching" services. I have a quick look at analytics and rankings and notice that the website already ranks fairly well for that term but from the home page and not the service page. I go through the usual process of optimising the site (on-page - content, meta data, internal linking) as well as a linkbuilding campaign throughout the next couple of month's, however this results in either just the home page improving or the business page does improve, but the homepage's existing ranking has suffered, therefore not benefiting the site overall. My question: If a term already ranks or receives a decent amount of traffic from the home page and not from the page that its supposed to, why do you think its the case and what would you be your approach to try shift the traffic to the individual page, without impacting the site too much?. Note: To add the home page keyword target term would have been updated? Thanks, Vahe
Intermediate & Advanced SEO | | Vahe.Arabian0 -
Can links indexed by google "link:" be bad? or this is like a good example by google
Can links indexed by google "link:" be bad? Or this is like a good example shown by google. We are cleaning our links from Penguin and dont know what to do with these ones. Some of them does not look quality.
Intermediate & Advanced SEO | | bele0 -
Why does this page not show in google at all?
www.lavenderblue-flowers.co.uk Sorry for formatting, below is the source. There are alot of blocks from robots.txt but is there anything easily rectified to get this site SOME visibility? Duplicate content maybe PANDA had it? No backlink profile too which isnt helping but even still, surprising to see a domain auth of 1. Thanks in advance for any responses. DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta http-equiv="expires" content="Fri, 17 Jun 2011 12:06:27 GMT"><title>Bridport Interflora Florist, Lavender Blue, Dorset, DT16 3XDtitle><meta name="description" content="Lavender Blue in Bridport, Dorset, DT16 3XD delivers to Interflora florist based in Bridport is a well established family run business with a dedicated team of florists. We specialise in beautiful wedding flowers and take great pride in our funeral tributes, floral arrangements designed for any occasion for local, national and worldwide delivery."><meta name="keywords" content="Bridport,Interflora Florist,Lavender Blue,Dorset,DT16 3XD"><meta name="abstract" content="Interflora florist based in Bridport is a well established family run business with a dedicated team of florists. We specialise in beautiful wedding flowers and take great pride in our funeral tributes, floral arrangements designed for any occasion for local, national and worldwide delivery."><meta name="robots" content="index,nofollow"><link rel="stylesheet" type="text/css" href="/kernel/styles/print.css?new=new" media="print"><link rel="stylesheet" href="/kernel/styles/d4.css?designtype=d4;theme=blue;" type="text/css"><style type="text/css">style><script language="JavaScript1.2" src="/kernel/utils.js?new" type="text/javascript">script><script language="JavaScript1.2" type="text/javascript" src="/kernel/interflora.js?head=1;si=1000343;">script><script language="JavaScript1.2" type="text/javascript">script><script language="javascript"> var b_site_url = getcookie('b_site_url');if (b_site_url != "" && !getcookie('referral_id') && location.protocol == 'http:' && b_site_url != location.host && location.pathname.indexOf('catalog2') == -1) location.href = location.protocol + "//" + b_site_url + location.pathname + location.search;script>head><body><img border="0" src="/kernel/images/speck.gif" width="1" height="1" alt class="nospace"><div id="page-body"><table class="page-topbanner" border="0" cellpadding="0" cellspacing="0"><tr><td background="/kernel/images/d4/border-blue_03.gif" align="left" valign="top"><img src="/kernel/images/d4/border-blue_01.gif" alt>td><td colspan="2" style="background-image: url(/kernel/images/d4/border-blue_03.gif); background-position: top; background-repeat: repeat-x;"><img src="/kernel/images/speck.gif" width="300" height="50" alt>td><td align="right" valign="top" background="/kernel/images/d4/border-blue_03.gif"><img src="/kernel/images/d4/border-blue_04.gif" alt>td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-repeat: repeat-y;" align="left" valign="top"><img src="/kernel/images/d4/border-blue_01b.gif" alt>td><td valign="top" class="sd-image_only" id="sd-logo_store" colspan="1" rowspan="1"><img src="/kernel/imageload?ttl2=15;table=content_images;key1=fd_img_2606422_1" alt="" title="">td><td class="logo-if" align="right"><img src="/kernel/images/logo-if.png" alt="interflora.co.uk the flower experts™">td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;" colspan="3" align="center"><table id="website" cellspacing="0" border="0" align="center"><tr><td colspan="3" id="fol_address">1 Lilliput Lane, Bridport, Dorset, DT16 3XDtd>tr><tr><td id="email" colspan="3"><b>Email:b> lavenderblueflowers@hotmail.co.uktd>tr><tr><td style="padding-right:10px;"><b>Phone:b> 01308 459145td><td style="padding-right:10px;"><b>Fax:b> 01308 458417td>tr>table>td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr><tr><td style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;" colspan="3" align="center"><div class="page-topmenu"><table class="page-topmenu" cellspacing="0"><tr><td id="account"><a href="/myaccount/"><img src="/kernel/images/d4/icon-account.gif" style="margin: 3px 3px 4px 3px; vertical-align: middle;" width="15" height="13" alt="My Account">My Accounta>td><td id="menu"><a href="/">Homea><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=about">About Usa><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=delivery">Delivery Infoa><img class="bullet" src="/kernel/images/speck.gif" width="2" height="2" alt style="margin: 10px 4px 10px 4px;"><a href="/page.xml?page_name=contactus">Contact Usa>td><td id="cart"><a href="/shopcart/"><img src="/kernel/images/d4/icon-shopcart.gif" style="margin: 3px; vertical-align: middle;" width="14" hieght="14" alt="Shopping Basket">Shopping Basketa>td>tr>table>div>td><td style="background-image: url(/kernel/images/d4/border-blue_07.gif); background-position: right; background-repeat: repeat-y;"> td>tr>table><p id="browser-warning" style="display: block; padding: 2px; border: 2px solid #FC9F85; margin: 0px; background-color: #FDFFC4;"><b>For your information:b> This message has appeared because we've noticed your browser doesn't fully support all functions of this site. For further information please <a href="/page.xml?page_name=faq">click herea>.p><script language="JavaScript1.2" type="text/javascript">var theBrowser = navigator.userAgent.toLowerCase();if(is_nav7up || (parseInt(is_moz_ver) >= 1) || is_ie5_5up || theBrowser.indexOf("safari") != -1) {hideElement('browser-warning',0);}script><table class="body" border="0" cellspacing="0" cellpadding="0"><tr><td align="left" valign="bottom" style="background-image: url(/kernel/images/d4/border-blue_05.gif); background-position: left; background-repeat: repeat-y;"><img src="/kernel/images/d4/border-blue_05.gif" alt>td><td class="menu" valign="top"><img src="/kernel/images/speck.gif" width="150" height="1" border="0" alt><br><form method="get" action="/search/index.xml" id="leftnav_search"><table border="0" cellspacing="0" class="global-search"><tr><th colspan="2">SEARCHth>tr><tr><td width="50%"><input class="text" type="text" name="keywords1" id="search" value maxlength="50" size="15">td><td align="left"><input type="submit" class="button" name="search" id="search" value="GO">td>tr><tr><td colspan="2" align="left"><a href="/search/advanced_search.xml">Advanced Searcha>td>tr>table>form><div class="menusection"><a class="menuParent_off" id="parentcat_2003443" href="/catalog/category.xml?category_id=2003443"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Anniversaryspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003443">div><a class="menuParent_off" id="parentcat_2003453" href="/catalog/category.xml?category_id=2003453"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Congratulationsspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003453">div><a class="menuParent_off" id="parentcat_4" href="/category/flower-arrangements/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">All Flower Bouquetsspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_4">div><a class="menuParent_off" id="parentcat_2003493" href="/catalog/category.xml?category_id=2003493"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Sympathy & Funeralspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003493">div><a class="menuParent_off" id="parentcat_2003463" href="/catalog/category.xml?category_id=2003463"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Thank Youspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003463">div><a class="menuParent_off" id="parentcat_2001478" href="/category/same-day-flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Same Day Flower Deliveryspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2001478">div><a class="menuParent_off" id="parentcat_2124203" href="/category/summer_flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Summer Flowersspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2124203">div><a class="menuParent_off" id="parentcat_2003403" href="/category/luxury-flowers/"><div class="spacer">div><span class="menu-bullet"><img src="/kernel/images/arrow.gif" class="menu-bullet">Luxury Flowersspan><div class="spacer">div>a><div class="menuChildren" id="menuChildrencat_2003403">div><a class="menuParent_off" id="parentcat_1000343" href="/catalo
Intermediate & Advanced SEO | | ewanstevenson0