What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have a metadata issue. My site crawl is coming back with missing descriptions, but all of the pages look like site tags (i.e. /blog/?_sft_tag=call-routing)
I have a metadata issue. My site crawl is coming back with missing descriptions, but all of the pages look like site tags (i.e. /blog/?_sft_tag=call-routing)
Intermediate & Advanced SEO | | amarieyoussef0 -
Redirect wordpress from /%post_id%/%postname%/ to /blog/%postname%/
Hi what is the code to redirect wordpress blog from site.com/%post_id%/%postname%/ to site.com/blog/%postname%/ We are moving the site to a new server and new url structure. Thanks in advance
Intermediate & Advanced SEO | | Taiger0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
If I hired you/your company to do my SEO ...
If i hired you or your company to do SEO for my site (http://goo.gl/XUH3f) what would be the first steps you'd take? I'm pretty sure i've covered all of the basics myself, I'm just left trying to figure out what i should do next... rankings have been going up and down for the last few weeks, but even when they're up, they're not high enough 🙂 (and then they go back down anyway) ... I know some of you are going to say build links, please at least give me an example of one or two sites you'd try to get to link to mine... I'm open to any advice or feedback as I'm just a website owner who's been doing their own SEO & learning on the fly... Thanks a lot!
Intermediate & Advanced SEO | | Prime850 -
Disallow my store in robots.txt?
Should I disallow my store directory in robots.txt? Here is the URL: https://www.stdtime.com/store/ Here are my reasons for suggesting this: SEOMOZ finds crawl "errors" in there that I don't care about I don't think I care if the search engines index those pages I only have one product, and it is not an impulse buy My product has a 60 day sales cycle, so price is less important than features
Intermediate & Advanced SEO | | raywhite0 -
Duplicate Content / 301 redirect Ariticle issue
Hello, We've got some articles floating around on our site nlpca(dot)com like this article: http://www.nlpca.com/what-is-dynamic-spin-release.html that's is not linked to from anywhere else. The article exists how it's supposed to be here: http://www.dynamicspinrelease.com/what-is-dsr/ (our other website) Would it be safe in eyes of both google's algorithm (as much as you know) and with Panda to just 301 redirect from http://www.nlpca.com/what-is-dynamic-spin-release.html to http://www.dynamicspinrelease.com/what-is-dsr/ or would no-indexing be better? Thank you!
Intermediate & Advanced SEO | | BobGW0 -
Problem w/ 301 Redirect
Here is how I did the configuration of the redirects: I don’t understand why the destination page is different from the one is configured in the apache server. Any ideas? For example: http://www.meliacaribetropical.com/spanish/entertainment/ is being already being 301 redirected to a 404 page (http://www.meliacaribetropical.com/es/index.htmlentertainment/) that does not exist on the Apache server. As you can see, the url was incorrectly written. Another occurrence from the spreadsheet is http://www.meliacaribetropical.com/spanish/gallery/beach.html, which is also being 301 redirected to a 404 page (http://www.meliacaribetropical.com/es/index.htmlgallery/beach.html). This is causing a hard 404 page. Here is my .httpaccess file: <virtualhost 192.168.200.25:80=""></virtualhost> ServerAdmin ecommerce@sol-group.com DocumentRoot /home/www/solgroup/americas/meliacaribetropical.com ServerName www.meliacaribetropical.com ServerAlias meliacaribetropical.com Redirect permanent /spanish/services/ http://www.meliacaribetropical.com/en/services.html Redirect permanent /entertainment/ http://www.meliacaribetropical.com/en/services.html Redirect permanent /press/ http://www.meliacaribetropical.com/en/index.html Redirect permanent /spanish/ http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/restaurantes/ http://www.meliacaribetropical.com/es/gastronomia.html Redirect permanent /spanish/entertainment/ http://www.meliacaribetropical.com/es/servicios.html Redirect permanent /spanish/services/ http://www.meliacaribetropical.com/es/servicios.html Redirect permanent /es/spa/ http://www.meliacaribetropical.com/es/servicios.html Redirect permanent /spanish/accommodations/ http://www.meliacaribetropical.com/es/habitaciones.html Redirect permanent /spanish/spa/ http://www.meliacaribetropical.com/es/servicios.html Redirect permanent /spanish/royal/ http://www.meliacaribetropical.com/es/servicio-real.html Redirect permanent /spanish/dining/ http://www.meliacaribetropical.com/es/gastronomia.html Redirect permanent /spanish/flintstones/ http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/galeria/ http://www.meliacaribetropical.com/es/visor.html Redirect permanent /spanish/gallery/ http://www.meliacaribetropical.com/es/visor.html Redirect permanent /es/reuniones-eventos/ http://www.meliacaribetropical.com/es/grupos.html Redirect permanent /lowest-rate.php http://www.meliacaribetropical.com/en/index.html Redirect permanent /es/los-picapiedra/ http://www.meliacaribetropical.com/es/index.html Redirect permanent /gallery/beach.html http://www.meliacaribetropical.com/en/index.html Redirect permanent /gallery/dining.html http://www.meliacaribetropical.com/en/gastronomy.html Redirect permanent /gallery/pools.html http://www.meliacaribetropical.com/en/index.html Redirect permanent /spanish/sitemap.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/galeria/playa.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/galeria/restaurantes.html http://www.meliacaribetropical.com/es/gastronomia.html Redirect permanent /es/galeria/piscinas.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/prensa/ http://www.meliacaribetropical.com/es/index.html Redirect permanent /spanish/gallery/beach.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /spanish/gallery/pools.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /spanish/gallery/dining.html http://www.meliacaribetropical.com/es/gastronomia.html Redirect permanent /spanish/press/ http://www.meliacaribetropical.com/es/index.html Redirect permanent /en/groups.html http://www.meliacaribetropical.com/en/groups.html Redirect permanent /terms-condition.php http://www.meliacaribetropical.com/en/index.html Redirect permanent /es/all_inclusive.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/terms-condition.php http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/prensa/family-facilities-amenities.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/outside-us-telephone-listing.php http://www.meliacaribetropical.com/en/index.html Redirect permanent /spanish/press/melia-international-brand-overhaul.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /es/prensa/melia-international-brand-overhaul.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /press/family-facilities-amenities.html http://www.meliacaribetropical.com/en/index.html Redirect permanent /press/melia-international-brand-overhaul.html http://www.meliacaribetropical.com/en/index.html Redirect permanent /es/prensa/melia-caribe-tropical-announces-fall-promotion.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /spanish/press/melia-caribe-tropical-announces-fall-promotion.html http://www.meliacaribetropical.com/es/index.html Redirect permanent /press/melia-caribe-tropical-announces-fall-promotion.html http://www.meliacaribetropical.com/en/index.html
Intermediate & Advanced SEO | | Melia0 -
Domain w/ Identical Content to Site we are Optimizing
Hi Guys, We've been optimizing a client's site for about a year or so now and on a call the other day the client brought up that he owns and operates another site that's marketing the same product, but to a difference audience (we work on the direct to consumer side, this is a distributior focused site),with the same exact content as the site we are optimizing. Obviously this is a major duplcant content issue and we need to get it resolved very quickjly. We have already reccomendt to the client that we re-write content, but this is where my questions comes in - Which site should we rewrite the content on? The site we are optimizing is the more impoorant of the two, while we still want the other site to hold rankings we dont want to end up accidently optimizing the other site wherein the site we are working on full time suffers a lost when a "compeiting" site creates compeltely new content and may, accidentally, end up ranking higher than the site we are focusing on full time. As links also play a role, would that be a KPI to look at here in determining which site gets new content and which does not? In this scenairo, would would you guys recommend? Just want to make sure I'm dotting all my I's, and crossing T's here. Many thanks to all in advance, Mike
Intermediate & Advanced SEO | | Havas_Disco0