Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block session id URLs with robots.txt
-
Hi,
I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt.
Which directive should I use:
User-agent: *
Disallow: ?filter=or
User-agent: *
Disallow: /?filter=In other words, is the forward slash in the beginning of the disallow directive necessary?
Thanks!
-
Hi Martijn,
Thanks for the answer. Regarding the forward slash in the beginning, is it necessary to use this?
In the robots text from Zalando for example, you can see that they don't use it for a lot of filters.
-
Uhh, that's not what the requester is looking for and could actually cause tons of problems if you would apply this on a site that you're unaware of. I would always go with the most limiting robots.txt that you can and in this case, I would go with: /?filter=
-
Hi,
The following should suffice as it will black any URL with a "?" in it
User-agent: * Disallow: /*?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | Jun 16, 2016, 11:17 AM | Malika11 -
Image URLs - best practice
Hi - I'm assuming image URL best practice follows same principles as non image URLs (not too many files and so on) - I notice alot of web devs putting photos in subdomains, so wonder if I'm missing something (I usually avoid subdomains like the plague)!
Intermediate & Advanced SEO | Aug 10, 2015, 4:35 PM | McTaggart1 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | Jun 15, 2015, 4:01 PM | jmorehouse0 -
What is the best URL structure for categories?
A client's site currently uses the URL structure: www.website.com/�tegory%/%postname% Which I think is optimised fairly well, as the categories are keywords being targeted. However, as they are using a category hierarchy, often times the URL looks like this: www.website.com/parent-category/child-category/some-post-titles-are-quite-long-as-they-are-long-tail-terms Best practise often dictates (such as point 3 in this Moz article) that shorter URLs are better for several reasons. So I'm left with a few options: Remove the category from the URL Flatten the category hierarchy Shorten post titles two a word or two - which would hurt my long tail search term traffic. Leave it as it is What do we think is the best route to take? Thanks in advance!
Intermediate & Advanced SEO | Jun 6, 2013, 11:53 AM | underscorelive0 -
Strange URLs, how do I fix this?
I've just check Majestic and have seen around 50 links coming from one of my other sites. The links all look like this: http://www.dwww.mysite.com
Intermediate & Advanced SEO | Apr 11, 2013, 2:51 AM | JohnPeters
http://www.eee.mysite.com
http://www.w.mysite.com The site these links are coming from is a html site. Any ideas whats going on or a way to get rid of these urls? When I visit the strange URLs such as http://www.dwww.mysite.com, it shows the home page of http://www.mysite.com. Is there a way to redirect anything like this back to the home page?0 -
URL Error or Penguin Penalty?
I am currently having a major panic as our website www.uksoccershop.com has been largely dropped from Google. We have not made any changes recently and I am not sure why this is happening, but having heard all sorts of horror stories of penguin update, I am fearing the worst. If you google "uksoccershop" you will see that the homepage does not rank. We previously ranked in the top 3 for "football shirts" but now we don't, although on page 2, 3 and 4 you will see one of our category pages ranking (this didn't used to happen). Some rankings are intact, but many have disappeared completely and in some cases been replaced by other pages on our site. I should point out our existing rankings have been consistently there for 5-6 years until today. I logged into webmaster tools and thankfully there is no warning message from Google about spam, etc, but what we do have is 35,000 URL errors for pages which are accessible. An example of this is: | URL: | http://www.uksoccershop.com/categories/5_295_327.html | | Error details In Sitemaps Linked from Last crawled: 6/20/12First detected: 6/15/12Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request. Is it possible this is the cause of the issue (we are not currently sure why the URL's are being blocked) and if so, how severe is it and how recoverable?If that is unlikely to cause the issue, what would you recommend our next move is?All help is REALLY REALLY appreciated 🙂
Intermediate & Advanced SEO | Jun 22, 2012, 3:45 PM | ukss19840 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | Jan 5, 2012, 3:30 PM | ENSO0 -
Brackets in a URL String
Was talking with a friend about this the other day. Do Brackets and or Braces in a URL string impact SEO? (I know short human readable etc... but for the sake of conversation has anyone relaised any impacts of these particular Characters in a URL?
Intermediate & Advanced SEO | Oct 2, 2011, 8:46 PM | AU-SEO0