Should all pages on a site be included in either your sitemap or robots.txt?
-
I don't have any specific scenario here but just curious as I come across sites fairly often that have, for example, 20,000 pages but only 1,000 in their sitemap. If they only think 1,000 of their URL's are ones that they want included in their sitemap and indexed, should the others be excluded using robots.txt or a page level exclusion? Is there a point to having pages that are included in neither and leaving it up to Google to decide?
-
Thanks guys!
-
You bet - Cheers!
-
Clever PHD,
You are correct. I have found that these little housekeeping issues like eliminating duplicate content really do make a big difference.
Ron
-
I thinks Ron's point was that if you have a bunch of duplicates, the dups are not "real" pages, if you are only counting "real" pages. Therefore, if Google indexes your "real" pages and the dup versions of them, you can have more pages indexed. That is the issue then that you have duplicate versions of the same page in Google's index and so which will rank for a given key term? You could be competing against yourself. That is why it is so important you deal with crawl issues.
-
Thank you. Just curious, how would the number of pages indexed be higher than the number of actual pages?
-
I think you are looking at the pages indexed which is generally a higher number than those on your web site. There is a point to marking things up so that there is a no follow on any pages that you do not want indexed as well as properly marking up the web pages that you do specifically want indexed. It is really important that you eliminate duplicate pages. A common source of these duplicates is improper tags on the blog. Make sure that your tags are set up in a logical hierarchy like your site map. This will assist the search engines when they re index your page.
Hope this helps,
Ron
-
You want to have as many pages in the index as possible, as long as they are high quality pages with original content - if you publish quality original articles on a regular basis, you want to have all those pages indexed. Yes, from a practical perspective you may only be able to focus on tweaking the SEO on a portion of them, but if you have good SEO processes in place as you produce those pages, they will rank long term for a broad range of terms and bring traffic..
If you have 20,000 pages as you have an online catalog and you have 345 different ways to sort the same set of page results, or if you have keyword search URLs, or printer friendly version pages or your shopping cart pages, you do not want those indexed. These pages are typically, low quality/thin content pages and/or are duplicates and those do you no favor. You would want to use the noindex meta tag or canonical where appropriate. The reality is that out of the 20,000 pages, there are probably only a subset that are the "originals" and so you dont want to waste Googles time in crawling those pages.
A good concept here to look up is Crawl Budget or Crawl Optimization
http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Validated pages on GSC displays 5x more pages than when performing site:domain.com?
Hi mozzers, When checking the coverage report on GSC I am seeing over 649,000 valid pages https://cl.ly/ae46ec25f494 but when performing site:domain.com I am only seeing 130,000 pages. Which one is more the source of truth especially I have checked some of these "valid" pages and noticed they're not even indexed?
Intermediate & Advanced SEO | | Ty19860 -
Keyword stuffing on category pages - eCommerce site
Hi there fellow Mozzers. I work for a wine company, and I have a theory that some of our category pages are not ranking as well as they could, due to keyword stuffing. The best example is our Champagne category page, which we are trying to rank for the keyword Champagne, currently rank 6ish. However, when I load the page into Moz, it tells me that I might be stuffing, which I am not, BUT my products might be giving both Moz and Google this impression as well. Our product names for any given Champagne is "Champagne - {name}" and the producer is "Champagne {producer name}. Now, on the category pages we have a list of Champagnes, actually 44 Which means that with the way we display them, with both name of the wine, the name of the producer AND the district. That means we have 132 mentions of the word "Champagne" + the content text that I have written. I am wondering, how good is Google at identifying that this is in fact not stuffing, but rather functionality that makes for this high density of the keyword? Is there anything I can do? I mean, we can change it so it's not listed with Champagne on all the products, but I believe it would make the usability suffer a bit, not a lot - but it's a question of balance and I would like to hear if anyone has encountered a similar problem, if it is in fact a problem?
Intermediate & Advanced SEO | | Nikolaj-Landrock2 -
Would it work to place H1 (or important page keywords) at the top of your page in HTML and move lower on page with CSS?
I understand that the H1 tag is no longer heavily correlated with stronger ranking signals but it is more important that Keywords or keyphrases are at the top of a page. My question is, if I just put my important keyword (or H1) toward the top of my page in the HTML and move it towards the middle/lower portion with css position elements, will this still be viewed by Googlebot as important keywords toward the top of my page? QCaxMHL
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Site Migration of 4 sites into 1?
Hi Guys, I have a massive project involving a migration of 4 sites into 1. 4 sites include: **www.MainSite.com ** www.E-commerce.com www.Membership.com www.ResearchStudy.com Goal of this project is to have 1-4 regrouped into Main Site I will be following the best practice from this post https://moz.com/blog/web-site-migration-guide-tips-for-seos which has an awesome checklist. I am actually about to start Phase 3: URL redirect mapping. Because all of these sites have hundreds of duplicates, I figured I should first resolve the Main Site dup issues before creating the URL redirect mapping but what about the other domains (2,3,4) though? Should I first resolve the Dup issues on those ones as well or it is not necessary since they will be pointing into the Main Site new domain? I want to make sure I don't overwork the programming team and myself. Thanks For sharing your expertise and any tips on how should I move forward with this.
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Redirecting Pages from site A to site B
Hi, I have a client who have a solid, high ranking content based site (site A). They have now created an ecommerce site in addition (site B). To give site B a boost in terms of search engine visibility upon launch, they now wish to redirect approx 90% of site As pages to site B. What would be the implications of this? Apart from customers being automatically redirected from the page they thought they where landing on, how would google now view site A? What are your thoughts to thier idea. I am trying to talk them out of it as I think its a poor one.
Intermediate & Advanced SEO | | Webrevolve0 -
Sitemaps
I am working with a site that has sitemaps broken down very specifically. By page type: article, page etc and also broken down by Category. Unfortunately, this is not done hierarchically. Category and page type are separate maps, they are not nested. My question here is: Is is detrimental to have two separate sitemaps that point to the same pages? Should we eliminate one of these taxonomies, or maybe just try to make them hierarchical? IE item type -> category -> pagetitle Is there an issue with having a sitemap index that points to a nested sitemap index? (I dont think so, but might as well be sure. Thanks Moz Community! Can't delete my question, but turns out that isn't how they are structured. Food for thought anyway I suppose.
Intermediate & Advanced SEO | | MarloSchneider0 -
Handling Similar page content on directory site
Hi All, SEOMOZ is telling me I have a lot of duplicate content on my site. The pages are not duplicate, but very similar, because the site is a directory website with a page for cities in multiple states in the US. I do not want these pages being indexed and was wanting to know the best way to go about this. I was thinking I could do a rel ="nofollow" on all the links to those pages, but not sure if that is the correct way to do this. Since the folders are deep within the site and not under one main folder, it would mean I would have to do a disallow for many folders if I did this through Robots.txt. The other thing I am thinking of is doing a meta noindex, follow, but I would have to get my programmer to add a meta tag just for this section of the site. Any thoughts on the best way to achieve this so I can eliminate these dup pages from my SEO report and from the search engine index? Thanks!
Intermediate & Advanced SEO | | cchhita0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0