Can URLs blocked with robots.txt hurt your site?
-
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us?
I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
-
90% not, first of all, check if google indexed them, if not, your robots.txt should do it, however I would reinforce that by making sure those URLs are our of your sitemap file and make sure your robots's disallows are set to ALL *, not just google for example.
Google's duplicity policies are tough, but they will always respect simple policies such as robots.txt.
I had a case in the past when a customer had a dedicated IP, and google somehow found it, so you could see both the domain's pages and IP's pages, both the same, we simply added a .htaccess rule to point the IP requests to the domain, and even when the situation was like that for long, it doesn't seem to have affected them. In theory google penalizes duplicity but not in this particular cases, it is a matter of behavior.
Regards!
-
I've seen people say that in "rare" cases, links blocked by Robots.txt will be shown as search results but there's no way I can imagine that would happen if it's duplicates of your content.
Robots.txt lets a search engine know not to crawl a directory - but if another resource links to it, they may know it exists, just not the content of it. They won't know if it's noindex or not because they don't crawl it - but if they know it exists, they could rarely return it. Duplicate content would have a better result, therefore that better result will be returned, and your test sites should not be...
As far as hurting your site, no way. Unless a page WAS allowed, is duplicate, is now NOT allowed, and hasn't been recrawled. In that case, I can't imagine it would hurt you that much either. I wouldn't worry about it.
(Also, noindex doesn't matter on these pages. At least to Google. Google will see the noindex first and will not crawl the page. Until they crawl the page it doesn't matter if it has one word or 300 directives, they'll never see it. So noindex really wouldn't help unless a page had already slipped through.)
-
I don't believe they are going to hurt you, it is more of a warning that if you are trying to have these indexed that at the moment they can't be accessed. When you don't want them to be indexed i.e. in this case, I don't believe you are suffering because of it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Links to my site still showing in Webmaster Tools from a non-existent site
We owned 2 sites, with the pages on Site A all linking over to similar pages on Site B. We wanted to remove the links from Site A to Site B, so we redirected all the links on Site A to the homepage on Site A, and took Site A down completely. Unfortunately we are still seeing the links from Site A coming through on Google Webmaster Tools for Site B. Does anybody know what else we can do to remove these links?
Intermediate & Advanced SEO | | pedstores0 -
Is site page structure hurting its chances to rank?
I have a client that sells geotextiles and related products. None of his keywords gets a lot of traffic google as it is a very B2B niche specific industry. For instance, and these numbers are off the top of my head The phrase geotextiles may get 80 searches a month and we have a domain.com/geotextiles.php page Then there are woven and nonwoven geotextiles which may get 30 searches a month We too have a domain.com/nonwoven-geotextiles.php and etc It then goes even further and has things like slit film series non woven /woven and we have subpages from there. To me, I feel as if we need to merge all of these pages to just a singular geotextile page with headers for woven and nonwoven and product info for the sub branches of those two. I feel as if we are basically competing for the same phrase again and again and again for very small amounts of traffic. Thoughts?
Intermediate & Advanced SEO | | Atomicx0 -
How do you 301 redirect URLs with a hashbang (#!) format? We just lost a ton of pagerank because we thought javascript redirect was the only way! But other sites have been able to do this – examples and details inside
Hi Moz, Here's more info on our problem, and thanks for reading! We’re trying to Create 301 redirects for 44 pages on site.com. We’re having trouble 301 redirecting these pages, possibly because they are AJAX and have hashbangs in the URLs. These are locations pages. The old locations URLs are in the following format: www.site.com/locations/#!new-york and the new URLs that we want to redirect to are in this format: www.site.com/locations/new-york We have not been able to create these redirects using Yoast WordPress SEO plugin v.1.5.3.2. The CMS is WordPress version 3.9.1 The reason we want to 301 redirect these pages is because we have created new pages to replace them, and we want to pass pagerank from the old pages to the new. A 301 redirect is the ideal way to pass pagerank. Examples of pages that are able to 301 redirect hashbang URLs include http://www.sherrilltree.com/Saddles#!Saddles and https://twitter.com/#!RobOusbey.
Intermediate & Advanced SEO | | DA20130 -
Redirecting Pages from site A to site B
Hi, I have a client who have a solid, high ranking content based site (site A). They have now created an ecommerce site in addition (site B). To give site B a boost in terms of search engine visibility upon launch, they now wish to redirect approx 90% of site As pages to site B. What would be the implications of this? Apart from customers being automatically redirected from the page they thought they where landing on, how would google now view site A? What are your thoughts to thier idea. I am trying to talk them out of it as I think its a poor one.
Intermediate & Advanced SEO | | Webrevolve0 -
Site revamp for neglected site - modifying site structure, URLs and content - is there an optimal approach?
A site I'm involved with, www.organicguide.com, was at one stage (long ago) performing reasonably well in the search engines. It was ranking highly for several keywords. The site has been neglected for some considerable period of time. A new group of people are interested in revamping the site, updating content, removing some of the existing content, and generally refreshing the site entirely. In order to go forward with the site, significant changes need to be made. This will likely involve moving the entire site across to wordpress. The directory software (edirectory.com) currently being used has not been designed with SEO in mind and as a result numerous similar pages of directory listings (all with similar titles and descriptions) are in google's results, albeit with very weak PA. After reading many of the articles/blog posts here I realize that a significant revamp and some serious SEO work is needed. So, I've joined this community to learn from those more experienced. Apart from doing 301 redirects for pages that we need to retain, is there any optimal way of removing/repairing the current URL structure as the site gets updated? Also, is it better to make changes all at once or is an iterative approach preferred? Many thanks in advance for any responses/advice offered. Cheers MacRobbo
Intermediate & Advanced SEO | | macrobbo0 -
301 redirect or Robots.txt on an interstatial page
Hey guys, I have an affiliate tracking system that works like this : an affiliate puts up a certain code on his site, for example : www.domain.com/track/aff_id This url leads to a page where the hit is counted, analysed and then 302 redirects to my sales page with the affiliates ID in the url : www.mysalespage.com/?=aff_id. However, we've noticed recently that one affiliate seems to be ranking for our own name and the url google indexed was his tracking url (domain.com/track/aff_id). Which is strange because there is absolutely nothing on that page, its just an interstatial page so that our stats tracking software can properly filter hits. To remove the affiliate's url from showing up in the serps, I've come up with 2 solutions : 1 - Change the redirect to a 301 redirect on his track page. 2 - Change our robots.txt page to block all domain.com/track/ pages from being indexed. My question is : if I 301 redirect instead of 302, will I keep the affiliates from outranking me for my own name AND pass on link juice or should I simply block google from crawling the interstatial tracking pages?
Intermediate & Advanced SEO | | CrakJason0 -
Can converting a site to HTTPS impact ranking?
We have a client with a very large site that would like to put a login on each page; however, that would require the entire site be put behind a secure connection (changing http:// to https:// on every page). They rank for a ton of keywords and rank well. Would the change impact their rankings at all? Could it possibly help them?
Intermediate & Advanced SEO | | dknewmedia0 -
Internal Site Structure Question (URL Formation and Internal Link Design)
Hi, I have an e-commerce website that has an articles section: There is an articles.aspx file that can be reached from the top menu and it holds links to all of the articles as follows: xxx.com/articles/article1.aspx
Intermediate & Advanced SEO | | BeytzNet
xxx.com/articles/article2.aspx I want to add several new articles under a new sections, for example a complete set of articles under the title of "buying guide" and the question is what would be the best way? I was thinking of adding a "computers-buying-guides.aspx" accessible from the top menu / footer and from it linking to: xxx.com/computer-buying-ghudes/what-to-check-prior-to-buying-a-laptop.aspx
xxx.com/computer-buying-ghudes/weight-vs-performance.aspx
etc. Any thoughts / recommendations? Thanks0