Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?
-
I've recently added a campaign within the SEOmoz interface and received an alarming number of errors ~9,000 on our eCommerce website. This site was built in Magento, and we are using search friendly url's however most of our errors were duplicate content / titles due to url's like: domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=1 and domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=4.
Is this hurting us in the search engines? Is rogerbot too good?
What can we do to cut off bots after the ".html?" ? Any help would be much appreciated
-
I had the same problem on http://www.tokenrock.com because I was doing a lot of URL Rewriting, it's a CMS system I wrote, but the same issue apply. I went from 7000+ errors according to SEOMoz, and I'm down to 700. Here's a few things I did:
Use canonicals on everything you possibly can.
Redirect 301 the items in the SERPS that are identical.
I'm not familiar with Magento to help you work though that side of it.
Having a link like: domainname/leather-chairs-244-16-price-1.html would work much better.
The ones you have listed are because somehow somewhere you (the site) have a link to it.
Unfortunately some of the CMS's are written by developers who don't fully understand SEO and why the ? is a bad thing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How does Googlebot evaluate performance/page speed on Isomorphic/Single Page Applications?
I'm curious how Google evaluates pagespeed for SPAs. Initial payloads are inherently large (resulting in 5+ second load times), but subsequent requests are lightning fast, as these requests are handled by JS fetching data from the backend. Does Google evaluate pages on a URL-by-URL basis, looking at the initial payload (and "slow"-ish load time) for each? Or do they load the initial JS+HTML and then continue to crawl from there? Another way of putting it: is Googlebot essentially "refreshing" for each page and therefore associating each URL with a higher load time? Or will pages that are crawled after the initial payload benefit from the speedier load time? Any insight (or speculation) would be much appreciated.
Intermediate & Advanced SEO | | mothner1 -
Google robots.txt test - not picking up syntax errors?
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me... e.g. /url/?*
Intermediate & Advanced SEO | | McTaggart
/url/?
/url/* and so on. I would use ? and not ? for example and what is ? for! - etc. Yet "Google robots.txt Tester" did not highlight the issues... I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns. Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors? Thanks, Luke0 -
Robots.txt, Disallow & Indexed-Pages..
Hi guys, hope you're well. I have a problem with my new website. I have 3 pages with the same content: http://example.examples.com/brand/brand1 (good page) http://example.examples.com/brand/brand1?show=false http://example.examples.com/brand/brand1?show=true The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages... I don't know how should do now, but, i am thinking 2 posibilites: Remove filters (true, false) and leave only the good page and show 404 page for others pages. Update robots.txt with disallow for these parameters & remove those URL's manually Thank you so much!
Intermediate & Advanced SEO | | thekiller990 -
Startpage and shop page shows the same thing, shall i set canonical url?
Our startpage http://siga-sverige.se/ and http://siga-sverige.se/butik/ shows the same woocommerce loop of all our products. Shall i set canonical url for http://siga-sverige.se/butik/ to http://siga-sverige.se/? Thanks! / Jonas
Intermediate & Advanced SEO | | knubbz0 -
Website using search term as URL brand name to cheat Google
Google has come a long way over the past 5 years, the quality updates have really helped bring top quality content to the top that is relevant for users search terms, although there is one really ANNOYING thing that still has not been fixed. Websites using brand name as service search term to manipulate Google I have got a real example but I wouldn't like to use it in case the brand mentions flags up in their tools and they spot this post, but take this search for example "Service+Location" You will get 'service+location.com' rank #1 Why? Heaven knows. They have less than 100 backlinks which are of a very low, spammy quality from directories. The content is poor compared to the competition and the competitors have amazing link profiles, great social engagement, much better website user experience and the data does not prove anything. All the competitors are targeting the same search term but yet the worst site is ranking the highest. Why on earth is Google not fixing this issue. This page we are seeing rank #1 do not even deserve to be ranking on the first 5 pages.
Intermediate & Advanced SEO | | Jseddon920 -
Case Sensitive URLs, Duplicate Content & Link Rel Canonical
I have a site where URLs are case sensitive. In some cases the lowercase URL is being indexed and in others the mixed case URL is being indexed. This is leading to duplicate content issues on the site. The site is using link rel canonical to specify a preferred URL in some cases however there is no consistency whether the URLs are lowercase or mixed case. On some pages the link rel canonical tag points to the lowercase URL, on others it points to the mixed case URL. Ideally I'd like to update all link rel canonical tags and internal links throughout the site to use the lowercase URL however I'm apprehensive! My question is as follows: If I where to specify the lowercase URL across the site in addition to updating internal links to use lowercase URLs, could this have a negative impact where the mixed case URL is the one currently indexed? Hope this makes sense! Dave
Intermediate & Advanced SEO | | allianzireland0 -
Using Meta Header vs Robots.txt
Hey Mozzers, I am working on a site that has search-friendly parameters for their faceted navigation, however this makes it difficult to identify the parameters in a robots.txt file. I know that using the robots.txt file is highly recommended and powerful, but I am not sure how to do this when facets are using common words such as sizes. For example, a filtered url may look like www.website.com/category/brand/small.html Brand and size are both facets. Brand is a great filter, and size is very relevant for shoppers, but many products include "small" in the url, so it is tough to isolate that filter in the robots.txt. (I hope that makes sense). I am able to identify problematic pages and edit the Meta Head so I can add on any page that is causing these duplicate issues. My question is, is this a good idea? I want bots to crawl the facets, but indexing all of the facets causes duplicate issues. Thoughts?
Intermediate & Advanced SEO | | evan890 -
How to Resolve Duplication of HTTPS & HTPP URLs?
Right now, I am working on eCommerce website. [Lamps Lighting and More] I can find out both URLs in website as follow. HTTP Version: http://www.lampslightingandmore.com/ HTTPS Version: https://www.lampslightingandmore.com/ I have check one of my competitor who has implemented following canonical on both pages. Please, view source code for both URLs. http://www.wayfair.com ** https://www.wayfair.com** Then, I checked similar thing in SEOmoz website. 🙂 Why should I not check in SEOmoz because, They are providing best SEO information so may be using best practice to deal with HTTPS & HTTP. LOL I tried to load following URL so it redirect to home page. https://www.seomoz.org is redirecting to http://www.seomoz.org But, following URL is not redirecting any where as well as not set canonical over there. https://www.seomoz.org/users/settings I can find out following code on http://www.seomoz.org/robots.txt **User-agent: *** ** Disallow: /api/user?*** So, I am quite confuse to solve issue. Which one is best 301 redirect or canonical tag? If any live example to see so that's good for me and make me more confident.
Intermediate & Advanced SEO | | CommercePundit0