Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallow wildcard match in Robots.txt
-
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks
Disallow: /?crawler=1
Disallow: /?mobile=1Thank you
-
This is a good reply.
Everyone gets really confused because Robots.txt has very minor, partial wildcard support and that makes people think that Robots.txt files use Regex, which they do not. Instead of having some weird half and half implementation, it would be much better IMO if the Robots.txt initiative / directive were updated to say "yes, you can use full regular expressions with regards to URL string matching".
Many people are left in a kind of silly guessing game because Google doesn't 'properly' elaborate or invest in expanding the definitions to their currently (publicly) assumed end-game.
People assume that if "*" will match any string of characters, "?" will match any individual character when used in a robots.txt file. This would make sense, but it's not the case. AFAIK there are only one or two supported wildcard characters in Robots.txt and that's why people get confused, looking for escape characters and the suchlike.
-
Hi Amanda,
Those lines tell GoogleBot not to crawl urls that have that text fragments.
For example, wont crawl: domain.com/category/product**?mobile=1**BUT, that doesnt mean that will not crawl every URL with question marks. For that, the line should be like this:
Disallow: /*?I do highly recommend you to read this guides:
About /robots.txt - Official site - Robotstxt.org
Robots.txt - Moz
Robots.txt: the ultimate guide - YOAST
The Complete Guide to Robots.txt - PORTENTHope it helps.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Good to use disallow or noindex for these?
Hello everyone, I am reaching out to seek your expert advice on a few technical SEO aspects related to my website. I highly value your expertise in this field and would greatly appreciate your insights.
Technical SEO | | williamhuynh
Below are the specific areas I would like to discuss: a. Double and Triple filter pages: I have identified certain URLs on my website that have a canonical tag pointing to the main /quick-ship page. These URLs are as follows: https://www.interiorsecrets.com.au/collections/lounge-chairs/quick-ship+black
https://www.interiorsecrets.com.au/collections/lounge-chairs/quick-ship+black+fabric Considering the need to optimize my crawl budget, I would like to seek your advice on whether it would be advisable to disallow or noindex these pages. My understanding is that by disallowing or noindexing these URLs, search engines can avoid wasting resources on crawling and indexing duplicate or filtered content. I would greatly appreciate your guidance on this matter. b. Page URLs with parameters: I have noticed that some of my page URLs include parameters such as ?variant and ?limit. Although these URLs already have canonical tags in place, I would like to understand whether it is still recommended to disallow or noindex them to further conserve crawl budget. My understanding is that by doing so, search engines can prevent the unnecessary expenditure of resources on indexing redundant variations of the same content. I would be grateful for your expert opinion on this matter. Additionally, I would be delighted if you could provide any suggestions regarding internal linking strategies tailored to my website's structure and content. Any insights or recommendations you can offer would be highly valuable to me. Thank you in advance for your time and expertise in addressing these concerns. I genuinely appreciate your assistance. If you require any further information or clarification, please let me know. I look forward to hearing from you. Cheers!0 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050 -
SEO Benefit from Redirecting New Exact Match Domains?
Hi, All! This is a question asked in the old Q & A section, but the answer was a little ambiguous and it was about 3 years ago, so I decided to repost and let the knowledgeable SEO public answer... From David LaFerney: It’s clear that it’s much easier to get high rankings for a term if your domain is an exact match for the query. If you own several such domains that are very related such as – investmentrealestate.com, positivecashflow.com, and rentalproperty.com – would you be able to benefit from those by 301ing them to a single site, or would you have to maintain separate sites to help capture those targeted phrases? In a nutshell – SEO wise, is it worth owning multiple domains to exactly match valuable search phrases? Or do you lose the exact match benefit when you redirect?>> To clarify: redirecting an old domain with lots of history and links to a new exact match domain seems to contain SEO benefit. (You get links+exact match domain, approximately.) But the other way around? Redirecting a new exact match domain to an older domain with links? Does that do anything for the ranking of the old domain for the exact match keyword? Or absolutely nothing? (My impression has been that it's nothing, but the question came up for a client and I just wanted to make sure I wasn't missing something.) Thanks in advance!
Technical SEO | | debi_zyx0