Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sub-directories or Nah
Hey MOz Squad, So I have a locksmith company with 7 locations all up and down the west coast. We set up each location in a sub-directory (mainly because they used to have other websites for these locations and we wanted to try and keep the juice so we 301 redirected back to these subdiretories) But it's making our efforts so tough because every time I want to change something I have to do it seven times! DO the sub directories really matter that munch for rank? One of my partners says google treats it like it own website and trying to rank just a page on a website for a certain city is harder. What do you guys think Keep the subs or ditch em?
Intermediate & Advanced SEO | | Meier0 -
How much SEO damage would it do having a subdomain site rather directory site?
Hi all! With a coleague we were arguing about what is better: Having a subdomain or a directory.
Intermediate & Advanced SEO | | Gaston Riera
Let me explain some more, this is about the cases: Having a multi-language site: Where en.domain.com or es.domain.com rather than domain.com/en/ or domain.com/es/ Having a Mobile and desktop version: m.domain.com or domain.com rather than domain.com/m or just domain.com. Having multiple location websites, you might figure. The dicussion started with me saying: Its better to have a directory site.
And my coleague said: Its better to have a subdomain site. Some of the reasons that he said is that big companies (such as wordpress) are doing that. And that's better for the business.
My reasons are fully based on this post from Rand Fishkin: Subdomains vs. Subfolders, Rel Canonical vs. 301, and How to Structure Links for SEO - Whiteboard Friday So, what does the community have to say about this?
Who should win this argue? GR.0 -
Sub-domain vs Root domain
I have recently taken over a website (website A) that has a domain authority of 33/100 and is linked to from 39 root domains. I have not yet selected any keywords to target so am currently unsure of ranking positions. However, website A is for a division of a company that has its own separate website (website B) which has a domain authority of 58/100 and over 1000 legitimate linking root domains. I have the option of moving website A to a sub-domain of website B. I also have the option of having website B provide a followed link to website A. So, my question is, for SEO purposes, is my website better off remaining on its own existing domain or is it likely to rank higher as a sub-domain of website B? I am sure there are pros and cons for both options but some opinions would be much appreciated.
Intermediate & Advanced SEO | | BallyhooLtd0 -
Meta robots or robot.txt file?
Hi Mozzers! For parametric URL's would you recommend meta robot or robot.txt file?
Intermediate & Advanced SEO | | eLab_London
For example: http://www.exmaple.com//category/product/cat no./quickView I want to stop indexing /quickView URLs. And what's the real difference between the two? Thanks again! Kay0 -
I want to Disavow some more links - but I'm only allowed one .txt file?
Hey guys, Wondering if you good people could help me out on this one? A few months back (June 19) I disavowed some links for a client having uploaded a .txt file with the offending domains attached. However, recently I've noticed some more dodgy-looking domains being indexed to my client's site so went about creating a new "Disavow List". When I went to upload this new list I was informed that I would be replacing the existing file. So, my question is, what do I do here? Make a new list with both old and new domains that I plan on disavowing and replace the existing one? Or; Just replace the existing .txt file with the new file because Google has recognised I've already disavowed those older links?
Intermediate & Advanced SEO | | Webrevolve0 -
Is our robots.txt file correct?
Could you please review our robots.txt file and let me know if this is correct. www.faithology.com/robots.txt Thank you!
Intermediate & Advanced SEO | | BMPIRE0 -
Are paid directories any good?
Hi Everyone, I have have listing our site in free directories. Are there any paid directories that are worth listing in? Our company and site is based in New Zealand. Thanks, Pete
Intermediate & Advanced SEO | | Hardley10 -
Are sub domains considered completely different than the root domain?
We have a project that is going to generate duplicate content. If we move the new content to a sub-domain (E.g. product.domain.com) will it still be considered duplicate content to the root domain? Or is it like having two completely different domains? Thanks!
Intermediate & Advanced SEO | | tripled5110