Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console - Sitemap
Hi all, Quick question. I'm trying to update my sitemap via Google Search Console using a sitemap.xml file that I've created with ScreamingFrog. However, when trying to submit it, it seems that Google only allows sitemaps that are located at a path within your domain (i.e. www.example.com/sitemap.xml) as opposed to being able to directly upload a sitemap.xml file.Is there any way that I can easily upload my sitemap.xml file? Or is there any easy way that I can upload the file to a path on my domain so I can upload via the URL?Any insight would be much appreciated!Best,Sung
Technical SEO | | hdeg0 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Robots.txt
Hello, My client has a robots.txt file which says this: User-agent: * Crawl-delay: 2 I put it through a robots checker which said that it must have a **disallow command**. So should it say this: User-agent: * Disallow: crawl-delay: 2 What effect (if any) would not having a disallow command make? Thanks
Technical SEO | | AL123al0 -
Mobile site is not ranking in the mobile search results
I posted last month about problems with a mobile site, which is served from a separate URL (m.mydomain.com) as currently responsive design is not an option. The problem was that the mobile site was being returned in the desktop index along with the desktop site, and the desktop site was being returned in the mobile index instead of the mobile site. I have therefore implemented rel=canonical and rel=alternate as is advised by Google, but this has stopped the desktop site from appearing in the mobile index, but hasn't caused the mobile site to rank instead. What should I do now? One idea I have is to remove the rel=canonical and rel=alternate links so that the desktop site ranks in the mobile index again. There is a redirect in place anyway so when people click on a desktop link from a mobile search, they will still be redirected to the mobile equivalent. I could then set the m.mydomain.com to noindex to stop it from being returned in the desktop results and potentially causing duplicate content issues? What do you think about this as a work around?
Technical SEO | | pugh0 -
Explain this search result
Hi folks, I came across a strange search result. Search on Google Australia for "income portfolio". http://www.google.com.au/search?sourceid=chrome&ie=UTF-8&q=income+portfolio See the first result? It's a login page. How is that search result showing? And in position #1! Where is it getting its title and descriptions tags from? Does Google have a way to somehow see what is behind the login? Appreciate your thought.
Technical SEO | | scotennis0 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Blocking other engines in robots.txt
If your primary target of business is not in China is their any benefit to blocking Chinese search robots in robots.txt?
Technical SEO | | Romancing0