What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What should I include in disavow file and/or reconsideration request?
My client got a manual penalty notice. Need to submit a disavow file and reconsideration request which is new territory for me. The task of contacting/disavowing 100's of sites to remove 1000's of links is a bit overwhelming. Answers to any of these questions would be greatly appreciated. Search console is showing 100's of hacked websites pointing to the site. Many of the incoming links showing in search console are already gone. Should I include in the disavow file or is the disavow file only for links that persist? I have read that Google does not actually read the #remarks in the disavow file. Since its manual penalty should I include them anyway since it's possible that a human could look it over? If anyone who has submitted a reconsideration request for unnatural links can comment on their use or non use of #remarks and the result that would be very helpful. You can tell that Google wants an effort to be made that the site owners are contacted. What is the best way to document that? In the reconsideration request?: The disavow file? or both.
Intermediate & Advanced SEO | | KentH0 -
Google robots.txt test - not picking up syntax errors?
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me... e.g. /url/?*
Intermediate & Advanced SEO | | McTaggart
/url/?
/url/* and so on. I would use ? and not ? for example and what is ? for! - etc. Yet "Google robots.txt Tester" did not highlight the issues... I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns. Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors? Thanks, Luke0 -
Duplicated content multi language / regional websites
Hi Guys, I know this question has been asked a lot, but I wanted to double check this since I just read a comment of Gianluca Fiorelli (https://moz.com/community/q/can-we-publish-duplicate-content-on-multi-regional-website-blogs) about this topic which made me doubt my research. The case: A Dutch website (.nl) wants a .be version because of conversion reasons. They want to duplicate the Dutch website since they speak Dutch in large parts of both countries. They are willing to implement the following changes: - Href lang tags - Possible a Local Phone number - Possible a Local translation of the menu - Language meta tag (for Bing) Optional they are willing to take the following steps: - Crosslinking every page though a language flag or similar navigation in the header. - Invest in gaining local .be backlinks - Change the server location for both websites so the match there country (Isn't neccessery in my opinion since the ccTLD should make this irrelevant). The content on the website will at least be 95% duplicated. They would like to score with there .be in Belgium and with there .nl in The Netherlands. Are these steps enough to make sure .be gets shown for the quarry’s from Belgium and the .nl for the search quarry’s from the Netherlands? Or would this cause a duplicated content issue resulting in filtering out version? If that’s the case we should use the canonical tag and we can’t rank the .be version of the website. Note: this company is looking for a quick conversion rate win. They won’t invest in rewriting every page and/or blog. The less effort they have to put in this the better (I know it's cursing when talking about SEO). Gaining local backlinks would bring a lot of costs with it for example. I would love to hear from you guys. Best regards, Bob van Biezen
Intermediate & Advanced SEO | | Bob_van_Biezen0 -
How do I handle this 301/indexing mess?
I'm working on a client's site and noticed a brisk drop in rankings. In doing some digging I found that the homepage (domain.com) is 301'd to domain.com/home.html. Here's my problem/questions: 1. domain.com is indexed by Google 2. domain.com/home.html is not indexed by Google 3. both domains have some healthy linking 4. Is the fact that domain.com/home.html impacting rankings? 5. How do carefully handle this situation (ex. redirect domain.com/home.html back to domain.com?) 6. See the attached jpeg for a visual representation of my debacle. hcIiPAs
Intermediate & Advanced SEO | | rhoadesjohn0 -
URL Redirect: http://www.example.net/ vs. http://www.example.net
I currently have a website set up so that http://www.example.net/ redirects to http://www.example.net but **http://www.example.net/ **has more links and a higher page authority. Should I switch the redirect around? Here's the Open Site Explorer metrics for both: http://www.example.net/ Domain Authority: 38/100 Page Authority: 48/100 Linking Root Domains: 112 Total Links: 235 http://www.example.net Domain Authority: 38/100 Page Authority: 45/100 Linking Root Domains: 18 Total Links: 39
Intermediate & Advanced SEO | | kbrake0 -
How to avoid seo loss after URL restructuring / change?
We are doing On Page SEO over haul of our website. Our old url used to be mydomain.com/send/FlowersInCity-1-CityName.html we are changing it to mydomain.com/send/Flowers-to-CityName Firstly, will it be advisable to do so since we are in the top 10 in most of the Keywords (but losing ranking each month): The website is very content rich site. Till beginning of 2012, we used be in the top three spots mostly due to On Page and Good content, thus getting the inbound links automatically. But now the things have change, industry has lot of competition and few players have already done heavy SEO for their website, both On and off page thus overtaking us in Ranking. We are also doing other requisite On Page and Off Page work but I am struck with the URL decision part. Secondly, and MOST IMPORTANTLY – if I should change the url, how to minimize the risk of losing the present SEO in this kind of URL restructuring case? Thanks Suman
Intermediate & Advanced SEO | | sumanpatra0 -
Is there any importance in including http:// in the url?
I have seen some sites that always redirect to https and some sites that always redirect to http://, but lately I have seen sites that force the url to just the site. As in [sitename].com, no www. no http://. Does this affect SEO in anyway? Is it good or bad for other things? I was surprised when I saw it and don't really know what effect it has.
Intermediate & Advanced SEO | | MarloSchneider0 -
If I hired you/your company to do my SEO ...
If i hired you or your company to do SEO for my site (http://goo.gl/XUH3f) what would be the first steps you'd take? I'm pretty sure i've covered all of the basics myself, I'm just left trying to figure out what i should do next... rankings have been going up and down for the last few weeks, but even when they're up, they're not high enough 🙂 (and then they go back down anyway) ... I know some of you are going to say build links, please at least give me an example of one or two sites you'd try to get to link to mine... I'm open to any advice or feedback as I'm just a website owner who's been doing their own SEO & learning on the fly... Thanks a lot!
Intermediate & Advanced SEO | | Prime850