SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best SEO Strategist to hire for 4-5K
Hello, We want to get a top notch company to look at us for 4-5K. We don't need SEO, we've got plenty of motion through the press and word of mouth, but if an all around agency was to give good advice I could get him some time with our CEO. How do I get the best for only 4-5K, we may continue with services or it may just be a one time thing. Who should I contact? Bob
Intermediate & Advanced SEO | | BobGW0 -
Using Meta Header vs Robots.txt
Hey Mozzers, I am working on a site that has search-friendly parameters for their faceted navigation, however this makes it difficult to identify the parameters in a robots.txt file. I know that using the robots.txt file is highly recommended and powerful, but I am not sure how to do this when facets are using common words such as sizes. For example, a filtered url may look like www.website.com/category/brand/small.html Brand and size are both facets. Brand is a great filter, and size is very relevant for shoppers, but many products include "small" in the url, so it is tough to isolate that filter in the robots.txt. (I hope that makes sense). I am able to identify problematic pages and edit the Meta Head so I can add on any page that is causing these duplicate issues. My question is, is this a good idea? I want bots to crawl the facets, but indexing all of the facets causes duplicate issues. Thoughts?
Intermediate & Advanced SEO | | evan890 -
Link Anchor Text - Best Practice?
Moz - Open Site Explorer using the following setup: Tab: Inbound Links
Intermediate & Advanced SEO | | Mark_Ch
Show: "all"
from: "Only Internal" I have run a number of random tests and have noticed the following results in the link anchor text. [No Anchor Text]
company name
website url
Home
etc. What is the best practice and naming convention to be used? Regards Mark0 -
Mobile SEO
Hey, In the following article, Google recommended using a 301 redirect but doesn't specify why. http://googlewebmastercentral.blogspot.co.uk/2011/02/making-websites-mobile-friendly.html I assume this is to pass over link equity to the relevant mobile/desktop variation. Can anyone confirm this? Also is there any other reason? Again assuming this would keep the correct URLs in the correct index? Anything else anyone can chip in would be great. Thanks
Intermediate & Advanced SEO | | CraigAddyman0 -
Best domain extension is .com or .net for SEO ?
Best domain extension is .com or .net for SEO ? I wanna make site ,in that business directory ,travel guides , Hotels many more . I wanna target a one kw local search about 1,000,000 per month .domain name ofthat keyword already has taken by some one but .net remaining plz advice me to proceed .
Intermediate & Advanced SEO | | innofidelity0 -
Mobile SEO vs. normal SEO?
Hi everyone, I wanted to ask you abour your opinon on mobile SEO. Do we already have two different Indices, one for mobile, one for desktop? Except a few mobile listings I don't see a difference yet. If yes, do I need to do special mobile SEO for my site or is it enough to have e.g. a responsive webdesign which detects the device and shows a different page? Are there any other extra Mobile SEO measures that should be considered? I know of the Mobile Sitemap and directories but is there anything else? Best regards
Intermediate & Advanced SEO | | CrazySEO0 -
Local SEO (Rankings) + UK-wide SEO (national rankings) - achieving both
Hi All, For clients wishing to sell online / generate leads nationally, yet still want to have a local online presence to attract town / county-wide customers, I've often placed Town / County locations within both the Title Tag (or just County if space is limited) and Meta Description, plus within the Hx headings, Alt-text and within the footer of every page. My question is, does adding the location of the client within these fields really infringe their attempts to rank nationally, as some nationally ranked pages have no mention of location while others have their location (Town, County or Both) shown within them? Any help, insight or feedback greatly appreciated 🙂 Happy New Year Tony
Intermediate & Advanced SEO | | Tony-Dimmock0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1