Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index detected in robots meta tag GSC issue_Help Please
Hi Everyone, We just did a site migration ( URL structure change, site redesign, CMS change). During migration, dev team messed up badly on a few things including SEO. The old site had pages canonicalized and self canonicalized <> New site doesn't have anything (CMS dev error) so we are working retroactively to add canonicalization mechanism The legacy site had URL’s ending with a trailing slash “/” <> new site got redirected to Set of url’s without “/” New site action : All robots are allowed: A new sitemap is submitted to google search console So here is my problem (it been a long 24hr night for me 🙂 ) 1. Now when I look at GSC homepage URL it says that old page is self canonicalized and currently in index (old page with a trailing slash at the end of URL). 2. When I try to perform a live URL test, I get the message "No: 'noindex' detected in 'robots' meta tag" , so indexation cant be done. I have no idea where noindex is coming from. 3. Robots.txt in search console still showing old file ( no noindex there ) I tried to submit new file but old one still coming up. When I click on "See live robots.txt" I get current robots. 4. I see that old page is still canonicalized and attempting to index redirected old page might be confusing google Hope someone can help to get the new page indexed! I really need it 🙂 Please ping me if you need more clarification. Thank you ! Thank you
Intermediate & Advanced SEO | | bgvsiteadmin1 -
Using a Sub Domain as a Main Domain?
Hi, I'm working on a site at the moment and the sub domain is acting as the main domain. This occurred when the site was redesigned and built on a sub domain for testing but it was never moved to the main domain when it went live (a couple of years ago). So little or no pages are live on domain.com but all on sub.domain.com. It's a large company but they have very poor rankings. Would you recommend that they move the sub domain back into the root folder? Does this involve renaming/re-pointing URLs? Thanks Louise
Intermediate & Advanced SEO | | MVIreland1 -
Correct keywords Anchor text for links passing
Hi i have some old pages with more link equity, i m planning to key some bestseller in the main content.. my question is on best use of anchor text, can i use the below for eg: Product name is Chloride Exide Safepower Cs 7-12 12V Sealed Battery so i want to use the key word which is "12v 7ah Battery" in anchor text or buy 12v 7ah battery in Anchor text, will this google consider as spam?? Pls suggest
Intermediate & Advanced SEO | | Rahim1190 -
301 from old site to new one , Should I point to home page or sub category page ?
Hey Seo Experts, I have a small website ranking for few terms like cabinets sale, buy etc . However what i have now decided is to launch a New website with more different products like living room furniture, wardrobes etc . Out of all these categories on new website Cabinets is one of the SubCategory . Now I do not want to have 2 websites . So wanted to 301 from small cabinets website to newly created website. Some of the doubts I have at the moment is ? 1 Should I REDIRECT 301 to sub category (i,e cabinets) which is purely related to Cabinets or Do a Redirect to HOME PAGE . As I also need more Authority to home page as well , as this is relatively new website ? 2 Second question related to this. If you have multiple sub domains does it divide the total authority & TF.Or it is just Ok to have multiple Sub domains if needed ? Any advice appreciated !! Thanks .
Intermediate & Advanced SEO | | aus00070 -
Sub-Domain or Folder. Which is better for SEO.?
Hey all. I just need clarification that which one is better to use for big property or travel portal. Check below example: I have a website which runs for multiple location like india, uk, canada, uae. For every location the content is different. So my question is that for better SEO results should i use india.xyz.com or **xyz.com/india/ **. One more example **canada.xyz.com **or xyz.com/canada/ Can anyone please suggest which one is better. Thanks in advance.
Intermediate & Advanced SEO | | PFX1110 -
Does Yahoo Directory Listing Pass Authority with PA:0 and 0 links from 0 Root Domains?
So we already have our brand listed in Yahoo Directory for a few years but today I noticed it is not listed in OSE and the pages we're listed on in Yahoo Dir are PA:0 / DA: 100 with 0 links from 0 Root Domains! (or with a PA:1) Does this mean no juice is being passed at all for this listing? Does it mean it is not even spidered by Google then as how can it be found if no inlinks? Does any authority still get passed from Yahoos domain with DA100 despite pages being PA0? I ask because I'm considering adding another company to Yahoo Dir to get some authority rather than traffic.
Intermediate & Advanced SEO | | emerald0 -
Duplicate content on sub-domains?
I have 2 subdamains intented for 2 different countries (Colombia and Venezuela) ve.domain.com and co.domain.com. The site it's an e-commerce with over a million products available so they have the same page with the same content on both sub-domains....the only differences are the prices a payment options. Does google take that as duplicate content? Thanks
Intermediate & Advanced SEO | | daniel.alvarez0 -
What directories are best for health or health related products?
I am trying to find out if there are any reputable directories related to health supplements and general health information.
Intermediate & Advanced SEO | | DonovanHarrell0