Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?
-
I've recently added a campaign within the SEOmoz interface and received an alarming number of errors ~9,000 on our eCommerce website. This site was built in Magento, and we are using search friendly url's however most of our errors were duplicate content / titles due to url's like: domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=1 and domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=4.
Is this hurting us in the search engines? Is rogerbot too good?
What can we do to cut off bots after the ".html?" ? Any help would be much appreciated
-
I had the same problem on http://www.tokenrock.com because I was doing a lot of URL Rewriting, it's a CMS system I wrote, but the same issue apply. I went from 7000+ errors according to SEOMoz, and I'm down to 700. Here's a few things I did:
Use canonicals on everything you possibly can.
Redirect 301 the items in the SERPS that are identical.
I'm not familiar with Magento to help you work though that side of it.
Having a link like: domainname/leather-chairs-244-16-price-1.html would work much better.
The ones you have listed are because somehow somewhere you (the site) have a link to it.
Unfortunately some of the CMS's are written by developers who don't fully understand SEO and why the ? is a bad thing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Meta robots
Hi, I am checking a website for SEO and I've noticed that a lot of pages from the blog have the following meta robots: meta name="robots" content="follow" Normally these pages should be indexed, since search engines will index and follow by default. In this case however, a lot of pages from this blog are not indexed. Is this because the meta robots is specified, but only contains follow? So will search engines only index and follow by default if there is no meta robots specified at all? And secondly, if I would change the meta robots, should I just add index or remove the meta robots completely from the code? Thanks for checking!
Intermediate & Advanced SEO | | Mat_C0 -
AMP Benefits
Hello, Does AMP have ranking benefits ? Should I just AMP my post or all the pages of my website, product page, homepage etc... Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Ranking for a brand term with "&" (and) in the name?
Hello Moz community. We have a company that rebranded their name to "Bar & Cocoa" with the URL https://barandcocoa.com/. It's been about 3 months, and the website has yet to show up organically anywhere within the first 50 results foer their brand terms. It seems that Google pretty much ignores the "&" or "and" word when typing in bar & cocoa, or bar and cocoa in search. You'd think with that with the exact domain name, it would at least move the needle a bit, but it has not helped. Even being in Denver, I'm getting results for a "Bar Cocoa" business located in Charlotte, NC, and the secondary pages that belong to that business, and then a bunch of other companies, products and irrelevant search results (like a parked domain)! Any suggestions or ideas, please help!
Intermediate & Advanced SEO | | flowsimple1 -
How to handle blank, auto generated system pages/urls
Hi Guys Our backend system has been creating listing pages based on out of date and irrelevant data meaning we have hundreds of thousands of pages that are blank but currently indexable and active. They're almost impossible to access from the front end and have 0 traffic pointing at them but you can access these pages if you have the URL and i'm pretty sure due to the site architecture, google is crawling them regardless. For the most part, I think its likely best to 301 these pages to the most closely related page on the site but I'm concerned we're wasting crawl budget here. We don't want these pages to be crawled or found. Would a sound solution be to make them inactive, no-index and create a custom 404 in the event anyone (or the crawler) managed to get to them? Would this enormous increase in 404 pages cause us issues? Many thanks
Intermediate & Advanced SEO | | Jon.Kennett0 -
Search engine blocked by robots-crawl error by moz & GWT
Hello Everyone,. For My Site I am Getting Error Code 605: Page Banned by robots.txt, X-Robots-Tag HTTP Header, or Meta Robots Tag, Also google Webmaster Also not able to fetch my site, tajsigma.com is my site Any expert Can Help please, Thanx
Intermediate & Advanced SEO | | falguniinnovative0 -
Http://blogsearch.google.com/ping
Is there any reason why a website would submit all their content (videos, photo galleries, articles) to this?
Intermediate & Advanced SEO | | MargaritaS0 -
Redirecting www.example.com to www.example.com/directory/
Hi All, There's been some internal debate going back and forth about redirecting the homepage of a site to a directory. There are a few different POVs circulating, one of which is that it's no different than redirecting to a /index page. Basically, the homepage is ranking for the keyword that we want the directory to rank for but I can't seem to justify placing this type of redirect. The content on both pages is different, but for the term both the homepage and the directory make sense to rank. Has anyone ever done anything like this before? Can anyone see any reason to do something like this? I believe this move would dilute the link value we currently have going to the homepage and potentially cause us to lose our #2 slot with the homepage in favor of a lower spot with the directory. I'd love to hear any thoughts on this/learn if anyone has experimented with this tactic. Thanks in advance!
Intermediate & Advanced SEO | | JamieCottle280 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1