Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
SEO Best Practices regarding Robots.txt disallow
-
I cannot find hard and fast direction about the following issue:
It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs?
I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ?
Thank you!!
-
mmm it depends.
it's really hard for me to answer without knowing your site but I would say that you're in the good direction. You want to provide google more ways to reach your quality content.
Now do you have any other page that is bringing bots there via a normal user navigation or is it all search driven?
While google can crawl pages that discovered via internal/external links it can't reproduce searches by typing in your nav bar, so I doubt those pages should be extremely valuable unless you link to them somehow. In that case you may want to keep google crawling them.
A different thing would be if you want to "index" them, as being searches they are probably aggregating different information already present on the site. For indexation purposes you may want to keep them out of the index while still allowing the bot to run through them.
Again beware of the crawl budget, you don't want google to be wandering around millions of search results instead of your money pages, unless you're able to let them crawl only a sub portion of that.
I hope this made sense
-
Thank you for your response! I'm going to do a bit more research but I think I will disallow "account", but unblock "search". The search feature on my site pulls up quality content, so seems like I would want that to be crawled. Does this sound logical to you?
-
That could be completely normal. Google sends a warning because you're giving conflicting directions as you are preventing them to crawl pages (via robots) you asked them to index (via sitemap).
They do not know how important those pages may be for you so you are the one that needs to assess what to do net.
Are those pages important for you? Do you want them to be in the index? if that's the case change your robots.txt rule, if not then remove them from the sitemap.
About the previous answer robots text is not used to block hackers but quite the opposite. Hackers can easily find via the robots txt which are the pages you'd like to block and visit them as they may be key pages (ex. wp-admin), but let's not focus on that as hackers have so many ways to find core pages that it's not the topic. Robots txt is normally used to avoid duplication issues and to prevent google from crawling low value pages and waste crawl budget.
-
Typically, you only want robots.txt to block access points that would allow hackers into your site like an admin page (e.g. www.examplesite.com/admin/). You definitely don't want it blocking your whole site. A developer or webmaster would be better at speaking to the specifics, but that's the quick, high-level answer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Best Practices for Title Tags for Product Listing Page
My industry is commercial real estate in New York City. Our site has 300 real estate listings. The format we have been using for Title Tags are below. This probably disastrous from an SEO perspective. Using number is a total waste space. A few questions:
Intermediate & Advanced SEO | | Kingalan1
-Should we set listing not no index if they are not content rich?
-If we do choose to index them, should we avoid titles listing Square Footage and dollar amounts?
-Since local SEO is critical, should the titles always list New York, NY or Manhattan, NY?
-I have red that titles should contain some form of branding. But our company name is Metro Manhattan Office Space. That would take up way too much space. Even "Metro Manhattan" is long. DO we need to use the title tag for branding or can we just focus on a brief description of page content incorporating one important phrase? Our site is: w w w . m e t r o - m a n h a t t a n . c o m <colgroup><col width="405"></colgroup>
| Turnkey Flatiron Tech Space | 2,850 SF $10,687/month | <colgroup><col width="405"></colgroup>
| Gallery, Office Rental | Midtown, W. 57 St | 4441SF $24055/month | <colgroup><col width="405"></colgroup>
| Open Plan Loft |Flatiron, Chelsea | 2414SF $12,874/month | <colgroup><col width="405"></colgroup>
| Tribeca Corner Loft | Varick Street | 2267SF $11,712/month | <colgroup><col width="405"></colgroup>
| 275 Madison, LAW, P7, 3,252SF, $65 - Manhattan, New York |0 -
Faceted Navigation URLs Best Practices
Hi, We are developing new Products Pages with faceted filters. You can see it here: https://www.viatrading.com/wholesale-products/ We have a feature allowing to Order By and Group By, which alters the order of all products. There will also be the option to view Products as a table, which will contain same products but with different design and maybe slightly different content of each product. All this will happen without changing the URL, https://www.viatrading.com/all/ Is this the best practice? Thanks,
Intermediate & Advanced SEO | | viatrading10 -
Best-practice URL structures with multiple filter combinations
Hello, We're putting together a large piece of content that will have some interactive filtering elements. There are two types of filters, topics and object types. The architecture under the hood constrains us so that everything needs to be in URL parameters. If someone selects a single filter, this can look pretty clean: www.domain.com/project?topic=firstTopic
Intermediate & Advanced SEO | | digitalcrc
or
www.domain.com/project?object=typeOne The problems arise when people select multiple topics, potentially across two different filter types: www.domain.com/project?topic=firstTopic-secondTopic-thirdTopic&object=typeOne-typeTwo I've raised concerns around the structure in general, but it seems to be too late at this point so now I'm scratching my head thinking of how best to get these indexed. I have two main concerns: A ton of near-duplicate content and hundreds of URLs being created and indexed with various filter combinations added Over-reacting to the first point above and over-canonicalizing/no-indexing combination pages to the detriment of the content as a whole Would the best approach be to index each single topic filter individually, and canonicalize any combinations to the 'view all' page? I don't have much experience with e-commerce SEO (which this problem seems to have the most in common with) so any advice is greatly appreciated. Thanks!0 -
Image URLs - best practice
Hi - I'm assuming image URL best practice follows same principles as non image URLs (not too many files and so on) - I notice alot of web devs putting photos in subdomains, so wonder if I'm missing something (I usually avoid subdomains like the plague)!
Intermediate & Advanced SEO | | McTaggart1 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Best practice to redirects based on visitors' detected language
One of our websites has two languages, English and Italian. The English pages are available at the root level:
Intermediate & Advanced SEO | | Damiano
www.site.com/ English homepage www.site.com/page1
www.site.com/page2 The Italian pages are available under the /it/ level:
www.site.com/it Italian homepage www.site.com/it/pagina1
www.site.com/it/pagina2 When an Italian visitor first visits www.mysit.com we'd like to redirect it to www.site.com/it but we don't know if that would impact search engine spiders (eg GoogleBot) in any way... It would be better to do a Javascript redirect? Or an http 3xx redirect? If so, which of the 3xx redirect should we use? Thank you0