Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Australian search - ZERO visibility and stumped
Fair warning, this is going to be long, but necessary to explain the situation and what has been done. I will take ANY suggestions, even if I have tried them already. We have a sister site in Australia, targeting Australian traffic. I have inherited what seems to be an incredible rat's nest. I've fixed over two dozen issues, but still haven't seemed to address the root cause. NOTE: Core landing pages have weak keyword targeting. I don't expect much here until I fix this. The main issues I'm trying to resolve first are with the unusual US-based targeting, and the inability of the homepage to rank for anything. The site is www[dot]castleford[dot]com[dot]au. Here's the rundown on what's going on: Problems: The site ranks for four times as many keywords in the US as it does in Australia. The site ranks for a grand total of 5 keywords on the first page for AU keywords. The homepage, while technically optimized on-page for "content marketing agency", and with content through MarketMuse, has historically ranked between 60-100, despite having a fairly strong DA with fairly weak competitors, based on AHREFs keyword difficulty, and Moz keyword difficulty. Oddly, the ranking has gone up to 5-7 for three day spurts over the past year. Infrequent indexing of homepage (used to be every 2-3 weeks, I've gotten that down to 1 week). Sequence of events: November 2017 - they made some changes to their URLs - some on the blog and some on the top nav LPs. Redirects seem okay. November 2017 - Substantial number of lost referring domains, not many seem to be quality. January 2018 - total number of AU ranking keywords more than halved. May/June 2018 - added a follow inbound link sitewide to an external site that they created. 20k inbound links with same anchor text to homepage. Site has a total of 24k inbound links. July-Sep 2018 - total number of US ranking keywords halved November 10 - I walked into this mess. What's been done: Reduced site load speed by over 150% (it was around 20 seconds). Create sitemap (100 entry batching) and submit to GSC. Improved MarketMuse score for the homepage. Changed language from "en-US" to "en-AU" Fetch and render - content is all crawlable and indexed properly. Changed site architecture for top nav core landing pages to establish clear hierarchy. All version of GSC created, non-www and www http, and non www https and www https Site crawl - normal amount of 404s, nothing stands out as substantial. http to https redirect okay. Robots.txt updated and okay. Checked GSC international targeting, confirmed AU. No manual links penalty I'm clearly stumped and could use some insights. Thanks to everyone in advance, if you can find time.
Technical SEO | | Brafton-Marketing0 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
How to set up redirects with a company takeover
Hi there, We are about to take over a player in the market with some good DA en PA's. We choose to redirect all the pages from the domain we take over to our main domain for now, later we want to redirect all categories to relevant and similar categories on our own domain. The company we take over is using a server which will be cancelled in a while. For now we set up the 301 redirect(s) on their server we take over. Because of the extra costs we will cancel the server in a few weeks/months. What is a common way to keep 301 redirects alive after cancelling the server of company we take over? I hope someone can give me the help I need in this one. Thanks in advance! Cheers,
Technical SEO | | MarcelMoz
Marcel0 -
Removal request for entire catalog. Can be done without blocking in robots?
Bunch of thin content (catalog) pages modified with "follow, noindex" few weeks ago. Site completely re-crawled and related cache shows that these pages were not indexed again. So it's good I suppose 🙂 But all of them are still in main Google index and shows up from time to time in SERPs. Will they eventually disappear or we need to submit removal request?Problem is we really don't want to add this pages into robots.txt (they are passing link juice down below to product pages)Thanks!
Technical SEO | | LocalLocal0 -
Is my robots.txt file working?
Greetings from medieval York UK 🙂 Everytime to you enter my name & Liz this page is returned in Google:
Technical SEO | | Nightwing
http://www.davidclick.com/web_page/al_liz.htm But i have the following robots txt file which has been in place a few weeks User-agent: * Disallow: /york_wedding_photographer_advice_pre_wedding_photoshoot.htm Disallow: /york_wedding_photographer_advice.htm Disallow: /york_wedding_photographer_advice_copyright_free_wedding_photography.htm Disallow: /web_page/prices.htm Disallow: /web_page/about_me.htm Disallow: /web_page/thumbnails4.htm Disallow: /web_page/thumbnails.html Disallow: /web_page/al_liz.htm Disallow: /web_page/york_wedding_photographer_advice.htm Allow: / So my question is please... "Why is this page appearing in the SERPS when its blocked in the robots txt file e.g.: Disallow: /web_page/al_liz.htm" ANy insights welcome 🙂0 -
Robots.txt Question
In the past, I had blocked a section of my site (i.e. domain.com/store/) by placing the following in my robots.txt file: "Disallow: /store/" Now, I would like the store to be indexed and included in the search results. I have removed the "Disallow: /store/" from the robots.txt file, but approximately one week later a Google search for the URL produces the following meta description in the search results: "A description for this result is not available because of this site's robots.txt – learn more" Is there anything else I need to do to speed up the process of getting this section of the site indexed?
Technical SEO | | davidangotti0 -
My organic search results are down 16% since the Penguin update 4/24
Penguin has affected my search results down 16% When I look at my SEOmoz scan the only problem I see is "too many on page links" The problem is that my blog for each month is considered one page-eg august 2007 I wrote many blogs-the total on page links was 106-but that included all the blogs that were written in a month. The other problem area is duplicate content. I thought Penguin was after "link farming" which I do not do. Any advice how I can correct this? Brooke
Technical SEO | | wianno1680