Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Robots.txt query

Technical SEO

714

Karen_Dauncey last edited by

Quick question, if this appears in a clients robots.txt file, what does it mean?

Disallow: /*/_/

Does it mean no pages can be indexed? I have checked and there are no pages in the index but it's a new site too so not sure if this is the problem.

Thanks

Karen
1 Reply Last reply
Reply Quote 0
Karen_Dauncey last edited by

Thank you so much, that is a great help!
1 Reply Last reply
Reply Quote 0
William.Lau @Karen_Dauncey last edited by

That blocks all spiders from viewing those pages. I am not sure what and who did the /* /_/, but unless there is something there they don't want indexed then it is not necessary to keep it.

One thing you mind want to keep in mind as well, just because you block it on robots txt, doesn't mean a spider can't still go there.

Sometimes they don't listen to the robots txt(looking at you baidu)
1 Reply Last reply
Reply Quote 1
Karen_Dauncey @William.Lau last edited by
```
User-agent: *
```
Thanks for your response.
1 Reply Last reply
Reply Quote 0
William.Lau last edited by

What is the user agent?
1 Reply Last reply
Reply Quote 1

Got a burning SEO question?

Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.

Start my free trial

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Duplicate content: using the robots meta tag in conjunction with the canonical tag?

We have a WordPress instance on an Apache subdomain (let's say it's blog.website.com) alongside our main website, which is built in Angular. The tech team is using Akamai to do URL rewrites so that the blog posts appear under the main domain (website.com/more-keywords/here). However, due to the way they configured the WordPress install, they can't do a wildcard redirect under htaccess to force all the subdomain URLs to appear as subdirectories, so as you might have guessed, we're dealing with duplicate content issues. They could in theory do manual 301s for each blog post, but that's laborious and a real hassle given our IT structure (we're a financial services firm, so lots of bureaucracy and regulation). In addition, due to internal limitations (they seem mostly political in nature), a robots.txt file is out of the question. I'm thinking the next best alternative is the combined use of the robots meta tag (no index, follow) alongside the canonical tag to try to point the bot to the subdirectory URLs. I don't think this would be unethical use of either feature, but I'm trying to figure out if the two would conflict in some way? Or maybe there's a better approach with which we're unfamiliar or that we haven't considered?
Technical SEO | | prasadpathapati

0
Do I need to block my cart page in robots.txt?

I have a site with woocommerce. Do I need to block the cart page?
Technical SEO | | EcommerceSite

0
Google Indexing Development Site Despite Robots.txt Block

Hi, A development site that has been set-up has the following Robots.txt file: User-agent: * Disallow: / In an attempt to block Google indexing the site, however this isn't the case and the development site has since been indexed. Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | CarlWint

0
Question about Robot.txt

I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format. For example this URL: www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap) Another error is www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time. Can somebody help me on the format of Robot.txt. Please? thanks
Technical SEO | | paumer80

0
How to add a disclaimer to a site but keep the content accessible to search robots?

Hi, I have a client with a site regulated by the UK FSA (Financial Services Authority). They have to display a disclaimer which visitor must accept before browsing. This is for real, not like the EU cookie compliance debacle 🙂 Currently the site 302 redirects anyone not already cookied (as having accepted) to a disclaimer page/form. Do you have any suggestions or examples of how to require acceptance while maintaining accessibility? I'm not sure just using a jquery lightbox would meet the FSA's requirements, as it wouldn't be shown if JS was not enabled. Thanks, -Jason
Technical SEO | | GroupM_APAC

0
Canonical solution for query strings?

Greetings, The Hotel company where I'm employed uses query strings in it's url's to track customers. The query strings are integrated into our property management system, and they help identify who we need to pay commissions to, so they aren't going anywhere. While I understand that session variables could have been a better solution, I sort of inherited this problem. The issue I'm running into is that my Webmaster tools picks up these query strings as actual url's. So for instance: www.url.com/index.php?P_SOURCE=WBFQ Seems like a duplicate page of my root, and like wise for all my other pages that use our booking widget. So, Is there a canonical solution to this issue? or would 301/302's be the only solution. Also, we may have 10 different but specific query strings to put into our urls. Would the 301/302 approach cause any server issues for say 10 pages? So 10 pages x 10 access codes = a lot of redirects. Thanks in advance, Cyril
Technical SEO | | Nola504

0
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?

Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing

0
Which is The Best Way to Handle Query Parameters?

Hi mozzers, I would like to know the best way to handle query parameters. Say my site is example.com. Here are two scenarios. Scenario #1: Duplicate content example.com/category?page=1
example.com/category?order=updated_at+DESC
example.com/category
example.com/category?page=1&sr=blog-header All have the same content. Scenario #2: Pagination example.com/category?page=1
example.com/category?page=2 and so on. What is the best way to solve both? Do I need to use Rel=next and Rel=prev or is it better to use Google Webmaster tools parameter handling? Right now I am concerned about Google traffic only. For solving the duplicate content issue, do we need to use canonical tags on each such URL's? I am not using WordPress. My site is built on Ruby on Rails platform. Thanks!
Technical SEO | | jombay

0