What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does link position matter in the content/html code
My question is that if I have several links going to different landing pages will the one at the top of the content pass more value than ones at the bottom. Assuming that there are not more than 1 of the same link in the content. The ultimate question is whether or not link position in the content/html code make a difference if it passes more value. This question comes in response to this whiteboard Friday https://www.youtube.com/watch?v=xAH762AqUTU Rand talks about how if there are 2 links going to the same URL from the same content page then google will only inherit the value of the anchor text from the first link on the page and not the both of them. Meaning that google will treat that second link as if it doesn’t exist. There are lots of resources that shows this was true but there isn’t much content newer than 2010 that say this is still true, We all know that things have changed a lot since then Does that make sense?
Intermediate & Advanced SEO | | 97th_Floor0 -
Two Domains, Same Products/Content
We're an e-commerce company with two domains. One is our original company name/domain, one is a newer top-level domain. The older domain doesn't receive as much traffic but is still searched and used by long-time customers who are loyal to that brand, who we don't want to alienate. The sites are both identical in products and content, which creates a duplicate content issue. I have come across two options so far: 1. a 301 redirect from the old domain to the new one. 2. Optimize the content on the newer domain (the strongest of the two) and leave the older domain content as is. Does anyone know of a solution better than the two I listed above or have experience resolving a similar problem in the past?
Intermediate & Advanced SEO | | ilewis0 -
Intro to programming/coding for seo
Hello, I am currently a SEO and am looking for an Intro to programming/coding course to help me implement various technical SEO tasks for my clients and the business-as the programming dept will not help me, as they do not see the value of SEO. Could someone pls recommend an online course that would introduce me to basic concepts and also specifically, the information that would help me to enhance our SEO? I would also like to better understand APIs. Thanks so much in advance for your help! Lauren
Intermediate & Advanced SEO | | lfrazer1 -
Duplicate/ <title>element too long issues</title>
I have a "duplicate <title>"/"<title> element too long" issue with thousands of pages. In the future I would like to automate these in a way that keeps them from being duplicated AND too long. The solution I came up with was to standardize these monthly posts with a similar, shorter, <title>, but then differentiate by adding the month and the year of the post at the end of each <title>. Hundreds of these come out every week, so it is hard to sit there and come up with a unique <title> every time. With this solution the <title> tags would undoubtedly be short enough, however my primary concern is, would simply adding the month and year at the end of each <title> be enough for Google/Moz to decide it is not a duplicate? How much variation is enough for it not to be deemed a duplicate <title>? </p></title>
Intermediate & Advanced SEO | | Brian_Dowd0 -
Canonical and Rel=next/prev Implementation
Hi, I have an ecommerce site that allows users to view numerous pages and sort by a number of options on categories. I've read numerous posts around my issue but am still a little confused on what is best practice with regards to the canonical tag and rel=next and prev. Below is an example of the various page/sort by URL's: Paginated URL: http://www.example.co.uk/category/subcategory.html?p=3 Sort by URL: http://www.example.co.uk/category/subcategory.html?dir=desc&order=price Paginated & Sort by URL: http://www.example.co.uk/category/subcategory.html?dir=desc&order=price&p=3 It is not viable for us to use a canonical tag to the view all page as some of the categories contain a large number of products and therefore would not have the best load speeds. Is it best to use the below structure when it comes to the canonical tag and rel=next and prev? Paginated URL: http://www.example.co.uk/category/subcategory.html?p=3 Sort by URL: http://www.example.co.uk/category/subcategory.html?dir=desc&order=price Paginated & Sort by URL: http://www.example.co.uk/category/subcategory.html?dir=desc&order=price&p=3 http://www.example.co.uk/category/subcategory.html?dir=desc&order=price&p=2" /> Thanks
Intermediate & Advanced SEO | | GrappleAgency0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0 -
Why is noindex more effective than robots.txt?
In this post, http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo, it mentions that the noindex tag is more effective than using robots.txt for keeping URLs out of the index. Why is this?
Intermediate & Advanced SEO | | nicole.healthline0 -
What are the different tactics for getting ranked/ included in Google finance searches such as http://www.google.com/finance/company_news?q=NASDAQ:ADBE
I don't know what ranking factors they are using for this feed. The results vary greatly from a search done at google.com or google.com/news and google.com/finance I'm working with a website that regularly publishes finance-related news and currently gets traffic from google finance. I'm wondering what we can do to optimize our news articles to possibly show more prominently or more often. Thanks
Intermediate & Advanced SEO | | joemascaro0