Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Block session id URLs with robots.txt
Hi, I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt. Which directive should I use: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Disallow: ?filter= or User-agent: *
Disallow: /?filter= In other words, is the forward slash in the beginning of the disallow directive necessary? Thanks!1 -
Looking for opinions on structuring meta title tags/page title/menu title/H1
Hi everyone I am hoping a few of you can share your opinions. I have been having conversations (okay, healthy debates) about how to write/structure meta title tag and how to compliment them with the H1, page title, menu name. To help explain the thought processes I will use a pretend keyword. How about "screwdriver". Case: (I made this up) we are redesigning a website for a construction tools manufacturing company (pretend name: ABC Tools) targeting OEMs who are interested in purchasing large quantities of tools. The product categories (to become main menu items) are Screwdrivers, Nails, Drills, and Hammers. (bear with me .... this is just an example I am making up on the fly) K. Circling back to screwdrivers - let's say we have one landing page (a primary category page and in the main menu) listing products and great details about screwdrivers. Focus keywords are screwdriver manufacturer, screwdriver supplier, construction screwdrivers Below are questions being debated. If you are willing ... how would you address these questions? And, can you explain WHY? QUESTION ONE: How would you structure the meta title tag (feel free to write one of your own) Screwdriver Manufacturer - Construction Screwdriver | ABC Tools ABC Tools - US-based Screwdriver Manufacturer Supplier Near You High-Quality Screwdrivers for Construction with ABC Tools QUESTION TWO: how would you write the H1 on the page? Would it match the meta tag? OR, would you write something different using the primary keyword? QUESTION THREE Remembering this is not a blog post ... it is a primary landing page linked to the main navigation. What would the menu title be? (remember the product categories above are how the main menu items are bucketed) Screwdrivers Screwdriver Manufacturer Typically in WordPress, the H1 and the menu title is auto-populated using the page title (not the title tag)... So, if we use Screwdrivers as the page title but we want the H1 to match the meta title tag, would we manually change the H1? Or, have the page title and title tag match, but manually change the menu item?
Intermediate & Advanced SEO | | Brenda.Haines1 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Embedding PDF previews and maintaining crawlability/link-equity.
One site that I'm working on has previously had a great deal of success from the pdf preview content on the site. The pdf previews are quite substantial and rank for many many long-tail terms that drive a reasonable amount of traffic back to the site to purchase the full version of the product. As part of a site redesign, the way the pdf previews are embedded/presented on the page is changing slightly: The proposed modal pop-up on the new site the code looks like thie: <object data="my-pdf-preview.pdf" type="application/pdf" style="width:100%; min-height:600px; max-height:100%;max-height:100%;"><embed src="my-pdf-preview.pdf" type="application/pdf"></object> Where as the old code looked like this: <object data="mt-pdf-previewpreview.pdf#view=FitH,50&scrollbar=1&toolbar=0&statusbar=0&messages=0&navpanes=0" <br="">type='application/pdf'
Intermediate & Advanced SEO | | DougRoberts
width='100%'
height='600'> It appears your Web browser is not configured to display PDF files. No worries, you can download the PDF file here.</object> Note: how previously the code contained a plain, standard link to the pdf document. My worry is that without this link, search engines won't a) be able to discover/crawl the pdf content or b) pass any link-equity to these pdfs. Does anyone have any experience/recommendations about this? I'd like to have some information before I request that they add a plain link to the pdf previews back onto the on-page content.0 -
Citation/Business Directory Question...
A company I work for has two numbers... one for the std call centre and one for tracking SEO. Now, if local citation/business directory listings have the same address but different numbers, will this affect local/other SEO results? Any help is greatly appreciated! 🙂
Intermediate & Advanced SEO | | geniusenergyltd0 -
Best Practice For Company/Client Logo Endorsement
Article: http://searchengineland.com/homepage-sliders-are-bad-for-seo-usability-163496 I came across the following article and somewhat agree with the authors summary.
Intermediate & Advanced SEO | | Mark_Ch
I find sliders a distraction to B2B users and overall offers no SEO benefits. Scenario
As a service provider, over time I have worked with many high profile blue chip comnpanies. As part of my site redesign, I'm looking to show users my client achievements. My initial thoughts are to carry out the following: On the home page I'm looking to incorporate some high profile company logos (similar to http://www.semrush.com) with a hyperlink "more customers" to the right of logo caption. The link will take the user to a dedicated page (www.mydomain.co.uk/customer) showing a comprehensive list of company logos. Questions
#1 Is the above practice good or bad.
#2 Is there a better way to achieve the above Any other practical advise on user experience, social engagement, website speed, etc would be much appreciated. Thanks Mark0 -
Robots Disallow Backslash - Is it right command
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
Intermediate & Advanced SEO | | Modi0 -
Duplicate Content on Wordpress b/c of Pagination
On my recent crawl, there were a great many duplicate content penalties. The site is http://dailyfantasybaseball.org. The issue is: There's only one post per page. Therefore, because of wordpress's (or genesis's) pagination, a page gets created for every post, thereby leaving basically every piece of content i write as a duplicate. I feel like the engines should be smart enough to figure out what's going on, but if not, I will get hammered. What should I do moving forward? Thanks!
Intermediate & Advanced SEO | | Byron_W0