Robots.txt file question? NEver seen this command before
-
Hey Everyone!
Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).
the command line is as follows:
Disallow: /*?*
I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me
Any help would be greatly appreciated!
Thanks, Rob
-
I don't think this is correct.
? is an attempt at using a RegEx in Robots file which I don't think works.
Further, if it was a properly formed regex, it would be ?
- is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.
http://www.robotstxt.org/robotstxt.html
I could be wrong, but the info on this site has been my understanding from the past too.
-
It depends on how your site is structured.
For example if you have a page at
http://www.yourdomain.com/products.php
and this shows different things based on the parameter, like:
http://www.yourdomain.com/products.php?type=widgets
You will want to get rid of this line in your robots.txt
However if the parameter(s) doesn't change the content on the page, you can leave it in.
-
Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.
-
Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.
-
So, for this parameter, should I keep it in the robots file?
-
Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:
http://www.google.com/search?q=seo
This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
Hi Everyone, I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file? robots.txt This file is to prevent the crawling and indexing of certain parts of your site by web crawlers and spiders run by sites like Yahoo! and Google. By telling these "robots" where not to go on your site, you save bandwidth and server resources. This file will be ignored unless it is at the root of your host: Used: http://example.com/robots.txt Ignored: http://example.com/site/robots.txt For more information about the robots.txt standard, see: http://www.robotstxt.org/wc/robots.html For syntax checking, see: http://www.sxw.org.uk/computing/robots/check.html Website Sitemap Sitemap: http://www.bestpricenutrition.com/sitemap.xml Crawlers Setup User-agent: * Allowable Index Allow: /*?p=
Technical SEO | | vetofunk
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/ Directories Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/ Paths (clean URLs) Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/ Files Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt Paths (no clean URLs) Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=250 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
Indexation question
Hi Guys, i have a small problem with our development website. Our development website is website.dev.website.nl This page shouldn't be indexed bij Google but unfortunately it is. What can i do to deindex it and ask google not to index this website. In the robots.txt or are there better ways to do this? Kind regards Ruud
Technical SEO | | RuudHeijnen0 -
Questions about root domain setup
Hi There, I'm a recent addition to SEOmoz and over the past few weeks I've been trying to figure things out. This whole SEO process has been a bit of a brain burner but its slowly becoming a little more clearer. For awhile I noticed that I was unable to get Open Site Explorer to display information about my site. It mentioned that that there was not enough data for the URL. Too recent of a site, no links, etc. Eventually I changed the the URL to include "www." and it pulled up results. I also noticed that a few of my page warnings are because of duplicate page content. One page will be listed as http://enbphotos.com. The other will be listed as http://www.enbphotos.com. I guess I'm not sure what this all means and how to change it. I'm also not really sure what the terminology even is and something regarding root domain seemed appropriate but I'm not sure if it is accurate. Any help/suggestions/links would be appreciated! Thanks, Chris
Technical SEO | | enbphotos0 -
Panda recovery timeframe question
Site was hit by Panda Aug. 22nd. Lost 90% of Google traffic. I know 🙂 We think we found a reason and made few changes to landing pages structure. Updated sitemaps submitted. When can we expect effect (if any) - few days or after next Panda data refresh? Thank you!P.S. What is also interesting, similar traffic loss from Bing/Yahoo happened at exactly the same date. Does that mean Bing is "stealing" search results from Google when can't provide their own relevant results? 🙂
Technical SEO | | LocalLocal0 -
Same URL in "Duplicate Content" and "Blocked by robots.txt"?
How can the same URL show up in Seomoz Crawl Diagnostics "Most common errors and warnings" in both the "Duplicate Content"-list and the "Blocked by robots.txt"-list? Shouldnt the latter exclude it from the first list?
Technical SEO | | alsvik0 -
Where do you go to get a question answered by seomoz?
I thought it was here before but now the seomoz folks don't seem to be responding in this forum. Is there another place to go to get seomoz to answer a question?
Technical SEO | | klkirby0