Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Confirming Robots.txt code deep Directories

Technical SEO

500

cbielich last edited by

Just want to make sure I understand exactly what I am doing

If I place this in my Robots.txt

Disallow: /root/this/that

By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone.

Am I correct in understanding this?
1 Reply Last reply
Reply Quote 0
MoosaHemani Banned last edited by

that's right!

Disallow: /root/this/ will bock the complete directly whereas,

Disallow: /root/this/that will only block the "that" within "this"

hope this helps!
1 Reply Last reply
Reply Quote 2

Got a burning SEO question?

Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.

Start my free trial

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Unsolved Using NoIndex Tag instead of 410 Gone Code on Discontinued products?

ecommerce noindex shopify indexed urls

Hello everyone, I am very new to SEO and I wanted to get some input & second opinions on a workaround I am planning to implement on our Shopify store. Any suggestions, thoughts, or insight you have are welcome & appreciated! For those who aren't aware, Shopify as a platform doesn't allow us to send a 410 Gone Code/Error under any circumstance. When you delete or archive a product/page, it becomes unavailable on the storefront. Unfortunately, the only thing Shopify natively allows me to do is set up a 301 redirect. So when we are forced to discontinue a product, customers currently get a 404 error when trying to go to that old URL. My planned workaround is to automatically detect when a product has been discontinued and add the NoIndex meta tag to the product page. The product page will stay up but be unavailable for purchase. I am also adjusting the LD+JSON to list the products availability as Discontinued instead of InStock/OutOfStock.
Then I let the page sit for a few months so that crawlers have a chance to recrawl and remove the page from their indexes. I think that is how that works?
Once 3 or 6 months have passed, I plan on archiving the product followed by setting up a 301 redirect pointing to our internal search results page. The redirect will send the to search with a query aimed towards similar products. That should prevent people with open tabs, bookmarks and direct links to that page from receiving a 404 error. I do have Google Search Console setup and integrated with our site, but manually telling google to remove a page obviously only impacts their index. Will this work the way I think it will?
Will search engines remove the page from their indexes if I add the NoIndex meta tag after they have already been index?
Is there a better way I should implement this? P.S. For those wondering why I am not disallowing the page URL to the Robots.txt, Shopify won't allow me to call collection or product data from within the template that assembles the Robots.txt. So I can't automatically add product URLs to the list.
Technical SEO | | BakeryTech

0
Pages being flagged in Search Console as having a "no-index" tag, do not have a meta robots tag??

Hi, I am running a technical audit on a site which is causing me a few issues. The site is small and awkwardly built using lots of JS, animations and dynamic URL extensions (bit of a nightmare). I can see that it has only 5 pages being indexed in Google despite having over 25 pages submitted to Google via the sitemap in Search Console. The beta Search Console is telling me that there are 23 Urls marked with a 'noindex' tag, however when i go to view the page source and check the code of these pages, there are no meta robots tags at all - I have also checked the robots.txt file. Also, both Screaming Frog and Deep Crawl tools are failing to pick up these urls so i am a bit of a loss about how to find out whats going on. Inevitably i believe the creative agency who built the site had no idea about general website best practice, and that the dynamic url extensions may have something to do with the no-indexing. Any advice on this would be really appreciated. Are there any other ways of no-indexing pages which the dev / creative team might have implemented by accident? - What am i missing here? Thanks,
Technical SEO | | NickG-123

0
Pages giving both 200 and 302 reponce codes?

We are having some issues with response codes on our product pages on our new site. It first came to my attention with the mozbot crawl which was picking up 1000s of 302 redirects, but when I checked them manually there was no redirect (and even the moz toobar was giving a 200 status) I then check with this tool http://tools.seobook.com/server-header-checker/?page=single&url=https%3A%2F%2Fwww.equipashop.ie%2Fshop-fittings-retail-equipment%2Fgridwall%2Fgridwall-shelves%2Fflat-gridwall-shelf.html&useragent=2&typeProtocol=11
And its showing that there are 2 responses at 302 and a 200 ( but with the same bot under googlebot setting only shows the 200 status). I'm also getting no warning about it in WMTs Does anyone know what's happening here and how worried about it should I be as it seems goggle is using only the 200 status btw the developer thinks it something to do with how the browser is handling the canonicallink, but I'm not convinced Thanks
Technical SEO | | PaddyDisplays

0
Seo and ssl error (Error code: sec_error_revoked_certificate)

Hi. An error occurred during a connection to esta-register.org. Peer's Certificate has been revoked. (Error code: sec_error_revoked_certificate) ** i want to know this error can be effected on seo or not?** esta
Technical SEO | | vahidafshari45

0
Robots.txt to disallow /index.php/ path

Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl

0
Do You Have To Have Access to Website Code to Use Open Graph

I am not a website programmer and all of our websites are in Wordpress. I never change the coding on the backend. Is this a necessity if one wants to use Open Graph?
Technical SEO | | dahnyogaworks

0
No indexing url including query string with Robots txt

Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL

0
How to prevent directory from being accessed by search engines?

Pretty much as the question says, is there any way to stop search engines from crawling a directory? I am working on a Wordpress installation for my site but don't want it to be listed in search engines until it's ready to be shown to the world. I know the simplest way is to password-protect the directory but I had some issues when I tried to implement that so I'd like to see if there's a way to do it without passwords. Thanks in advance.
Technical SEO | | Xee

0