Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Readd/Reindex a page that was 410'd
A script of ours had an error that caused some pages we didn't wish 410'd to be 410'd, we caught it in about 12 hours but for some pages it was too late. My question is, will those pages be reindexed again and how will that affect their page ranking will they eventually be back where they were? Would submitting a site map with them help, or what would be the best way to correct this error (submit the links to google indexer maybe?).
Intermediate & Advanced SEO | | Wana-Ryd0 -
Question about Indexing of /?limit=all
Hi, i've got your SEO Suite Ultimate installed on my site (www.customlogocases.com). I've got a relatively new magento site (around 1 year). We have recently been doing some pr/seo for the category pages, for example /custom-ipad-cases/ But when I search on google, it seems that google has indexed the /custom-ipad-cases/?limit=all This /?limit=all page is one without any links, and only has a PA of 1. Whereas the standard /custom-ipad-cases/ without the /? query has a much higher pa of 20, and a couple of links pointing towards it. So therefore I would want this particular page to be the one that google indexes. And along the same logic, this page really should be able to achieve higher rankings than the /?limit=all page. Is my thinking here correct? Should I disallow all the /? now, even though these are the ones that are indexed, and the others currently are not. I'd be happy to take the hit while it figures it out, because the higher PA pages are what I ultimately am getting links to... Thoughts?
Intermediate & Advanced SEO | | RobAus0 -
Intro to programming/coding for seo
Hello, I am currently a SEO and am looking for an Intro to programming/coding course to help me implement various technical SEO tasks for my clients and the business-as the programming dept will not help me, as they do not see the value of SEO. Could someone pls recommend an online course that would introduce me to basic concepts and also specifically, the information that would help me to enhance our SEO? I would also like to better understand APIs. Thanks so much in advance for your help! Lauren
Intermediate & Advanced SEO | | lfrazer1 -
Citation/Business Directory Question...
A company I work for has two numbers... one for the std call centre and one for tracking SEO. Now, if local citation/business directory listings have the same address but different numbers, will this affect local/other SEO results? Any help is greatly appreciated! 🙂
Intermediate & Advanced SEO | | geniusenergyltd0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Delete or not delete old/unanswered forum threads?
Hello everyone, here is another question for you: I have several forum postings on my websites that are pretty old and so they are sort of "dead discussion" threads. Some of those old discussion threads are still getting good views (but not new postings), and so I presume may be valuable for some users. But most of them are just answers to personal questions that I doubt someone else could be interested in. Besides that, many postings are just single, unanswered questions still waiting for an answer, forgotten, they are just sitting there, and will probably stay unanswered for years.... I don't think this may be good for SEO, am I right? How do you suggest to approach this kind of issues on forums or discussions sections on a website? I am eager to know your thoughts on all this. Thank you in advance! All the best, Fab.
Intermediate & Advanced SEO | | fablau0 -
Duplicate Content on Wordpress b/c of Pagination
On my recent crawl, there were a great many duplicate content penalties. The site is http://dailyfantasybaseball.org. The issue is: There's only one post per page. Therefore, because of wordpress's (or genesis's) pagination, a page gets created for every post, thereby leaving basically every piece of content i write as a duplicate. I feel like the engines should be smart enough to figure out what's going on, but if not, I will get hammered. What should I do moving forward? Thanks!
Intermediate & Advanced SEO | | Byron_W0 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12