Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What do you add to your robots.txt on your ecommerce sites?
-
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following:
- Checkout
- Basket
Then possibly:
- Price
- Theme
- Sortby
- other misc filters.
What do you include?
-
I'm on this same path since we too cannot use noindex / nofollow due to limited backend interaction with Bigcommerce.
I like to block all cart related pages, which for ecommerce sites can be a boat load.
- /cart.php
- /checkout.php
- /finishorder.php
- /*login.php
just to name a few, then you have the sorting and compare pages, they have to be blocked or a mess unfolds.
- Disallow: /*sort=newest
- Disallow: /*sort=bestselling
- Disallow: /*?page= ( Big duplicate page issue if you don't block this one with a wildcard, and cannot access your .htaccess file or the backend properly to noindex / nofollow )
Just to name a few, in my case, I only want the meat of the site to be indexed and rank for. Otherwise one client's site was ranking terms that more related to web development than the niche industry they lived in. Plus with a limited index budget, why would you want google or anyone else to crawl pages on your site with no SEO value towards your niche?
Unless you sold carts as in web developed carts for ecommerce sites you wouldn't want much of that indexed anyways, and even in that case, those pages aren't too useful for ranking. At least from what I've gathered in the niche industries.
-
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Ecommerce Site - Duplicate product descriptions & SKU pages
Hi I have a couple of questions regarding the best way to optimise SKU pages on a large ecommerce site. At the moment we have 2 landing pages per product - one is the primary landing page with no SKU, the other includes the SKU in the URL so our sales people & customers can find it when using the search facility on the site. The SKU landing page has a canonical pointing to the primary page as they're duplicates. Is this the best way? Or is it better to have the one page with the SKU in the URL? Also, we have loads of products with the very similar product descriptions, I am working on trying to include a unique paragraph or few sentences on these to improve the content - how dangerous is the duplicate content within your own site? I know its best to have totally unique content, but it won't be possible on a site with thousands of products and a small team. At the moment I am trying to prioritise the products to update. Thank you 🙂
Intermediate & Advanced SEO | | BeckyKey0 -
Merging Niche Site
I posted a question about this a while ago, but still haven't pulled the trigger. I have a main site (bobsclothing.com). I also have a EM niche site (i.e shirtsmall.com). It would be more efficient for me to merge these site, because: I would have to manage content, promos, etc. on a single site. In other words, I can focus efforts on 1 site. If I am writing content, I don't have to split the work. I don't have to worry about duplicate content. Right now, if I enter a product URL into copyscape, the other sites is returned for many products. What makes me apprehensive are: The niche site actually ranks for more keywords than the main site, although it has lower revenue. Slightly lower PA, and DA. Niche site ranks top 20 for a profitable keyword that has about 1300 exact match searches. If you include the longer tail versions of the keyword it would be more. If I merge these sites, and do proper 301s (product to product, category to category) how likely is it that main site will still rank for that keyword? Am I likely to end up with a site that has stronger DA? Am I better off keeping the niche site and just focusing content efforts on the few keywords that it can rank well for? I appreciate any advice. If someone has done this, please share your experience. TIA
Intermediate & Advanced SEO | | inhouseseo0 -
Should I use meta noindex and robots.txt disallow?
Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Splitting a Site into Two Sites for SEO Purposes
I have a client that owns a business that really could be easily divided into two separate business in terms of SEO. Right now his web site covers both divisions of his business. He gets about 5500 visitors a month. The majority go to one part of his business and around 600 each month go to the other. So about 11% I'm considering breaking off this 11% and putting it on an entirely different domain name. I think I could rank better for this 11%. The site would only be SEO'd for this particular division of the company. The keywords would not be in competition with each other. I would of course link the two web sites and watch that I don't run into any duplicate content issues. I worry about placing the redirects from the pages that I remove to the new pages. I know Google is not a fan of redirects. Then I also worry about the eventual drop in traffic to the main site now. How big of a factor is traffic in rankings? Other challenges include that the business services 4 major metropolitan areas. Would you do this? Have you done this? How did it work? Any suggestions?
Intermediate & Advanced SEO | | MSWD0 -
How To Best Close An eCommerce Site?
We're closing down one of our eCommerce sites. What is the best approach to do this? The site has a modest link profile (a young site). It does have a run of site link to the parent site. It also has a couple hundred email subscribers and established accounts. Is there a gradual way to do this? How do I treat the subscribers and account holders? The impact won't be great, but I want to minimize collateral damage as much as possible. Thanks.
Intermediate & Advanced SEO | | AWCthreads0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1