How to exclude URL filter searches in robots.txt
-
When I look through my MOZ reports I can see it's included 'pages' which it shouldn't have included i.e. adding filtering rules such as this one http://www.mydomain.com/brands?color=364&manufacturer=505
How can I exclude all of these filters in the robots.txt? I think it'll be:
Disallow: /*?color=$
Is that the correct syntax with the $ sign in it? Thanks!
-
Unless you're specifically calling out Bing or Baidu... in your Robots.txt file they should follow the same directives as Google so testing with Google's Robots.txt file tester should suffice for all of them.
-
Yes, but what about bing and rest of Search Engine?
-
Adrian,
I agree that there certainly is a right answer to the question posted, as the question asks specifically about one way to manage the issue, being a block of filters in the robots.txt file. What I was getting at is that this may or may not necessarily be the "best" way, and that I'd need to look at your site and your unique situation to figure our which would be the best solution for your needs.
It is very likely that with these parameters a robots.txt file block is the best approach, assuming the parameters aren't added by default into category page or category pagination page navigational links, as then it would affect the bot's ability to crawl the site. Also, if people are linking to those URLs (highly unlikely though) you may consider a robots meta noindex,follow tag instead so the pagerank could flow to other pages.
And I'm not entirely sure the code you provided above will work if the blocked parameter is the first one in the string (e.g. domain.com/category/?color=red) as there is the additional wildcard between the ? and the parameter. I would advise testing this in Google Webmaster Tools first.
- On the Webmaster Tools Home page, click the site you want.
- Under Crawl, click Blocked URLs.
- If it's not already selected, click the Test robots.txt tab.
- Copy the content of your robots.txt file, and paste it into the first box.
- In the URLs box, list the site to test against.
- In the User-agents list, select the user-agents you want (e.g. Googlebot)
-
There certainly is a right answer to my question - I already posted it here earlier today:
Disallow: /*?color=
Disallow: /?*manufacturer=Without the $ at the end which would otherwise denote the end of the URL.
-
Hello Adrian,
The Moz reports are meant to help you uncover issues like this. If you're seeing non-canonical URLs in the Moz report then there is a potential issue for Google, Bing and other search engines as well.
Google does respect wildcards (*) in the robots.txt file, though it can easily be done wrong. There is not right or wrong answer to the issue of using filters or faceted navigation, as each circumstance is going to be different. However, I hope some of these articles will help you identify the best approach for your needs:
(Note: Faceted Navigation is not exactly the same as category filters, but the issues and possible solutions are very similar
)Building Faceted Navigation That Doesn't Suck Faceted Navigation Whiteboard Friday
Duplicate Content: Block, Redirect or Canonical
Guide to eCommerce Facets, Filters and Categories
Rel Canonical How To and Why Not
Moz.com Guide to Duplicate ContentI don't know how your store handles these (e.g. does it add the filter automatically, or only when a user selects a filter?) so I can't give you the answer, but I promise if you read those articles above you will have a very good understanding of all of the options so you can choose which is best for you. That might end up being as simple as blocking the filters in your robots.txt file, or you may opt for rel canonical, noindex meta tag, ajax, Google parameter handling, etc...
Good luck!
-
It's not Google's index that I'm interested in in this case, it's for the MOZ reports. Moz was including over 10,000 'pages' because it was indexing these URLs. Now I know how to edit the robots.txt Moz will be prevented from indexing them again (we only have around 2,000 real pages, not 10,000)
-
I sought out the answer from a developer and got the following reply, so posting here in case it helps someone else:
To exclude pages with color or manufacture in them you can use
Disallow: /*?color=
Disallow: /?*manufacturer=A question mark in your try should be omitted as it denotes the end of the url
-
Hi
I would recommend excluding these in Google Webmaster Tools. Once logged in to your account under the "Crawl" menu you will find "URL Parameters". Find the relevant parameter in the list on this page and you can tell Google not to index these pages.
Hope this helps.
Steve
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Separate URL vs iFrame
Hi Everyone, I'm not a designer/developer and am an not extremely knowledgeable in SEO, but I'll try to be as clear as I can. One of the designers here is creating a recipe section on our website. He created it so that it's a container (or iFrame?) on the page. Basically, no matter what you click (different sections and recipes) the URL stays the same. I was told to find out from an SEO perspective if it's better to do things this way or have a separate URL for each section and recipe. It's been brought up that from a social/sharing standpoint separate URLs would be better so people can send a link directly to the specific recipe they want to share. Any thoughts/comments are appreciated! Thanks for the help!
On-Page Optimization | | AliMac260 -
Please give me advice on how to get this page back into search engines
Hi. Before we had our site www.in2town.co.uk upgraded, we had this section on the first page of google and even though it does not badly with other search engines, with google we are not in the top fifty. the page is http://www.in2town.co.uk/gastric-band-hypnotherapy i would be so grateful for advice on how to get this section back into google i am also concerned that i have a category called gastric band hypnotherapy, what i mean is, http://www.in2town.co.uk/gastric-band-hypnotherapy/crash-diets-can-damage-your-health-says-gastric-band-hypnosis-expert as you can see from above, we have the category name which is gastric band hypnotherapy and then the article name, just wondering if this is damaging us. any help to sort our ranking issue our would be amazing
On-Page Optimization | | ClaireH-1848860 -
Mixing hyphens and underscores in a url
Hello. I am working on a site that was built with underscores in the urls, but only in the page names, not in the subdirectories. All the subdirectories have one-word names. So a typical url is "example.com/sub1/sub2/page_name." We would like to change the name of one of the subdirectories to a name that would be very useful for SEO, but this new name is a hyphenated word, let's call it "new-sub." If we changed "sub2" to "new-sub" then our url would have a mix of underscores and hyphens: example.com/sub1/new-sub/page_name. But if I used "new_sub" instead, google would read the words as connected with an underscore, instead of reading the subdirectory as a hyphenated word, which would be less useful for SEO. It seems like it might be a problem to have a hyphen in a subdirectory and underscores in the page names. But I want the SEO value of the hyphenated word. Any recommendations? Thank you!
On-Page Optimization | | nyc-seo0 -
Directory site with an URL structure dilemma
Hello, We run a site, which lists local businesses and tag them by their nature of business (similar to Yelp). Our problem is, that our category and sub-category(i.e.: www.example.com/budapest/restaurant or www.example.com/budapest/cars/spare-parts) pages are extremely weak, and get almost no traffic, but most of the traffic (95+ percent) goes for the actual business pages. While this might be a completely normal thing, I still would like to strengthen our category (listing) pages as well, as these should be the ones targeted by some of general keywords, like ‘restaurant’ or ‘restaurant+budapest’. One of the issues I have identified as a possible problem, that we do not have a clear hierarchy within the site, so while the main category pages are linked from the homepage (and the sub-categories from here), there is no bottom-up linking from the business pages back to the category pages, as the business page URLs look like this: www.example.com/business/onyx-restaurant-budapest. I think, that the good site- and url structure for the above would be like this: www.example.com/budapest/restaurant/hungarian/onyx-restaurant. My only issue is, perhaps not with the restaurants but with others, that some of the businesses have multiple tags, so they can be tagged i.e. as car saloon, auto repair and spare parts at the same time. Sometimes, they even have 5+ tags on them. My idea is, that I will try to identify a primary tag for all the businesses (we maintain 99 percent of them right now), and the rest of their tags would be secondary ones. I would then use canonicalization and mark the page with the primary tag in the url as the preferred one for that specific content. With this scenario, I might have several URLs with the same content (complete duplicates), but they would point to one page only as the preferred one, while our visitors could still reach the businesses in any preferred ways, so either by looking for car saloons, auto-repair or spare parts. This way, we could also have breadcrumbs on all the pages, which now we miss completely. Can this be a feasible scenario? Might it have a side-effect? Any hints on how to do it a better way? Many thanks, Andras
On-Page Optimization | | Dilbak0 -
How to properly handle these search results pages
I work with Afternic.com and they are restructuring their website. The current search results pages generate tons of duplicate content, but it's not so clear to me what the best solution. Navigation links to all of these pages which are exactly the same and do not generate content on the specific url indicated. The results all display on variations of the url names.php http://www.afternic.com/names.php?feat=1 http://www.afternic.com/names.php?cls=1 Then they have this page with categories: http://www.afternic.com/categories.php And when you select a category, it generates sort results of names.php Now because there is no real content on names.php I'm wondering if all these url variations ought to have a rel canonical tag for names.php or if they should be blocked. If blocked, it seems we're stuck with a page that doesn't display any content. Thanks for any suggestions.
On-Page Optimization | | cakelady0 -
Question about URLs
Hello! I have a client that wants to upload an URL like this: www.example.com/keyword/page-name.html The main problem is that www.example.com/keyword/ doesn't exist and gives a 404 error so I'd prefer not doing that...... What do you think about this? And if the client wants to go ahead, is there any solution? A 301 to the final page would help? Thank you in advance!
On-Page Optimization | | Juandbbam0 -
Keyword in url, which way better?
Hello, is there a difference between urls for targeting keyword "brazil tourist visa" fastbrazilvisas.com/tourist or fastbrazilvisas.com/brazil-tourist-visa ? ran the report In-Page Optimization it tells "no keyword usage in url". is there an idea behind that? thanks
On-Page Optimization | | Kotkov0 -
Using meta robots 'noindex'
Alright, so I would consider myself a beginner at SEO. I've been doing merchandising and marketing for Ecommerce sites for about a year and a half now and am just now starting to attempt to apply some intermediate SEO techniques to the sites I work on so bear with me. We are currently redoing the homepage of our site and I am evaluating what links to have on it. I don't want to lose precious link juice to pages that don't need it, but there are certain pages that we need to have on the homepage that people just won't search for. My question is would it be a good move to add the meta robots 'noindex' tag to these pages? Is my understanding correct that if the only link on the page is back to the homepage it will pass back the linkjuice? Also, how many homepage links are too many? We have a fairly large ecommerce site with a lot of categories we'd like to feature, but don't want to overdo the homepage. I appreciate any help!
On-Page Optimization | | ClaytonKendall0