Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt, does it need preceding directory structure?
-
Do you need the entire preceding path in robots.txt for it to match?
e.g:
I know if i add Disallow: /fish to robots.txt it will block
/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anythingBut would it block?:
en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!
As basically I'm wanting to block many URL that have BTS- in such as:
http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybobBut have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:
http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingyThanks for listening
-
Yes this is what I thought, but wanted some second opinions.
Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:
/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look
-
You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish
You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*
This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/
In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Please help need some advice?
Can any of you guys please help me I have alerts on links coming in and it looks like recently someone did this, it looks maliciously done as it is only our domain mentioned and most are brand new posts? http://testosteroneclinicindenve53950.shotblogs.com/testosterone-clinic-in-denver-fundamentals-explained-6102386 http://claytondmnnp.ampedpages.com/Details-Fiction-and-testosterone-clinic-in-denver-16897309 http://vinylvehiclecarwrap38041.alltdesign.com/a-review-of-vinyl-vehicle-car-wrap-9574042 http://devinxccct.educationalimpactblog.com/1784474/little-known-facts-about-vinyl-vehicle-car-wrap http://keeganbsftf.ka-blogs.com/7488539/how-vinyl-vehicle-car-wrap-can-save-you-time-stress-and-money http://andybxoes.thezenweb.com/vinyl-vehicle-car-wrap-Fundamentals-Explained-17581028 http://kylerhfdzu.blogkoo.com/not-known-details-about-vinyl-vehicle-car-wrap-9029141 http://troyytkyn.timeblog.net/7695911/the-greatest-guide-to-vinyl-vehicle-car-wrap http://waylontyzab.pointblog.net/testosterone-clinic-in-denver-Secrets-16335972 http://testosteroneclinicindenve30516.onesmablog.com/Top-testosterone-clinic-in-denver-Secrets-17252737 http://emiliogkmop.blogofoto.com/7667522/top-guidelines-of-testosterone-clinic-in-denver http://caidenaczxt.blogs-service.com/7514172/testosterone-clinic-in-denver-fundamentals-explained http://daltonpyfms.mybjjblog.com/5-simple-statements-about-testosterone-clinic-in-denver-explained-6517932 Should I try to disavow these and submit to google or will google know our site which has been up for 5 years is not doing this? Should I do any of these https://tehnoblog.org/google-webmaster-tools-my-website-got-bombed-with-backlinks-what-to-do/
Intermediate & Advanced SEO | | BobAnderson0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Wordpress Blog in 2 languages. How to SEO or structure it?
Hi Moz community, I have got a wordpress blog currently in the spanish language. I want to create the same blog content but in english version. (manually translate it to english instead of using translation service such as Google Translate). How should i structure the blog for SEO? How will it work? Any structure markups i should know about? Any examples? Thanks
Intermediate & Advanced SEO | | WayneRooney0 -
Citation/Business Directory Question...
A company I work for has two numbers... one for the std call centre and one for tracking SEO. Now, if local citation/business directory listings have the same address but different numbers, will this affect local/other SEO results? Any help is greatly appreciated! 🙂
Intermediate & Advanced SEO | | geniusenergyltd0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Meta NoIndex tag and Robots Disallow
Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
Intermediate & Advanced SEO | | bjs2010
"There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0 -
Product URL structure for a marketplace model
Hello All. I run an online marketplace start-up that has around 10000 products listed from around 1000+ sellers. We are a similar model to etsy/ebay in the sense that we provide a platform but sellers to list products and sell them. I have a URL structure question. I have read http://www.seomoz.org/q/how-to-define-best-url-structure-for-product-pages which seems to show everyone suggests to use Products: products/category/product-name Categories: products/category as the structure for product pages. Because we are a marketplace (our category structure has multiple tiers sometimes up to 3) our sellers choose a category for products to go in. How we have handled this before is we have used: Products: products/last-tier-category-chosen/product-name (eg: /products/sweets-and-snacks/fluffy-marshmallows) Categories: products/category (eg: /products/sweets-and-snacks) However we have two issues with this: The categories can sometimes change, or users can change them which means the links completely change and undo any link building work built up. The urls can get a bit long and am worried that the most important data (the fluffy marshmallow that reflects in the page title and content) is left till too late in the URL. As a result we plan to change our URL structure (we are going through a rebuild anyhow so losing old links is not an issue here) so that the new structure was: Products: products/product-name(eg: /products/fluffy-marshmallows) Categories: products/category (eg: /products/sweets-and-snacks) My concern about doing this however, and question here, is whether this willnegatively impact the "structure" of pages when google crawls our marketplace.Because "fluffy marshmallows" will no longer technically fit into the url structure of "sweets and snacks". I dont know if this would have a negative impact or not. FYI etsy (one of the largest marketplace models in the world) us the latter approach and do not have categories in product urls, eg: listing/42003836/vintage-french-industrial-inspired-side Any ideas on this? Many thanks!
Intermediate & Advanced SEO | | LiamPatterson0