Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Looking for opinions on structuring meta title tags/page title/menu title/H1
Hi everyone I am hoping a few of you can share your opinions. I have been having conversations (okay, healthy debates) about how to write/structure meta title tag and how to compliment them with the H1, page title, menu name. To help explain the thought processes I will use a pretend keyword. How about "screwdriver". Case: (I made this up) we are redesigning a website for a construction tools manufacturing company (pretend name: ABC Tools) targeting OEMs who are interested in purchasing large quantities of tools. The product categories (to become main menu items) are Screwdrivers, Nails, Drills, and Hammers. (bear with me .... this is just an example I am making up on the fly) K. Circling back to screwdrivers - let's say we have one landing page (a primary category page and in the main menu) listing products and great details about screwdrivers. Focus keywords are screwdriver manufacturer, screwdriver supplier, construction screwdrivers Below are questions being debated. If you are willing ... how would you address these questions? And, can you explain WHY? QUESTION ONE: How would you structure the meta title tag (feel free to write one of your own) Screwdriver Manufacturer - Construction Screwdriver | ABC Tools ABC Tools - US-based Screwdriver Manufacturer Supplier Near You High-Quality Screwdrivers for Construction with ABC Tools QUESTION TWO: how would you write the H1 on the page? Would it match the meta tag? OR, would you write something different using the primary keyword? QUESTION THREE Remembering this is not a blog post ... it is a primary landing page linked to the main navigation. What would the menu title be? (remember the product categories above are how the main menu items are bucketed) Screwdrivers Screwdriver Manufacturer Typically in WordPress, the H1 and the menu title is auto-populated using the page title (not the title tag)... So, if we use Screwdrivers as the page title but we want the H1 to match the meta title tag, would we manually change the H1? Or, have the page title and title tag match, but manually change the menu item?
Intermediate & Advanced SEO | | Brenda.Haines1 -
SEO Best Practices regarding Robots.txt disallow
I cannot find hard and fast direction about the following issue: It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs? I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ? Thank you!!
Intermediate & Advanced SEO | | jamiegriz0 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
P.O Box VS. Actual Address
We have a website (http://www.delivertech.ca) that uses a P.O Box number versus an actual address as their "location". Does this affect SEO? Is it better to use an actual address? Thanks.
Intermediate & Advanced SEO | | Web3Marketing870 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Intro to programming/coding for seo
Hello, I am currently a SEO and am looking for an Intro to programming/coding course to help me implement various technical SEO tasks for my clients and the business-as the programming dept will not help me, as they do not see the value of SEO. Could someone pls recommend an online course that would introduce me to basic concepts and also specifically, the information that would help me to enhance our SEO? I would also like to better understand APIs. Thanks so much in advance for your help! Lauren
Intermediate & Advanced SEO | | lfrazer1 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0